Abstract-Modern society is increasingly surrounded by, and accustomed to, a wide range of Cyber-Physical Systems (CPS), Internet-of-Things (IoT), and smart devices. They often perform safety-critical functions, e.g., personal medical devices, automotive CPS and industrial automation (smart factories). Some devices are small, cheap and specialized sensors and/or actuators. They tend to run simple software and operate under control of a more sophisticated central control unit. The latter is responsible for the decision-making and orchestrating the entire system. If devices are left unprotected, consequences of forged sensor readings or ignored actuation commands can be catastrophic, particularly, in safety-critical settings. This prompts the following three questions: (1) How to trust data produced by a simple remote embedded device? and (2) How to ascertain that this data was produced via execution of expected software? Furthermore, (3) Is it possible to attain (1) and (2) under the assumption that all software on the remote device could be modified or compromised?
I. INTRODUCTION
The number and variety of special-purpose computing devices has been increasing dramatically. This includes all kinds of embedded devices, cyber-physical systems (CPS) and Internet-of-Things (IoT) gadgets, utilized in various "smart" or instrumented settings, such as homes, offices, factories, automotive systems and public venues. Tasks performed by these devices are often safety-critical. For example, a typical industrial control system depends on physical measurements (e.g., temperature, pressure, humidity, speed) reported by sensors, and on actions taken by actuators, e.g., turning on the A/C, sounding an alarm, or reducing speed.
A cyber-physical control system is usually composed of multiple sensors and actuators, in the form of low-cost microcontroller units (MCUs). Such devices run simple software (often on "bare metal") and operate under control of a remote central control unit. Despite their potential importance to overall system functionality, low-end MCUs are typically designed to minimize cost, size and energy consumption, e.g., TI MSP430. Therefore, their architectural security is often primitive or non-existent, thus making them vulnerable to malware infestations and other malicious software modifications. A compromised MCU can, for instance, spoof sensed quantities or ignore actuation commands leading to potentially catastrophic results. For example, in a smart city, largescale erroneous reports of electricity consumption by smart meters might lead to power outages. A medical device that returns incorrect values when queried by a remote physician might result in a wrong drug being prescribed to a patient. A compromised car engine temperature sensor that reports incorrect (low) readings can lead to undetected overheating and major damage. However, despite the very real risks of remote software compromise, most people tend to believe that these devices execute the expected software and thus perform their expected function.
In this paper, we argue that Proofs of Execution (PoX) are both important and necessary for securing low-end MCUs. Specifically, we demonstrate in Section VII, that PoX schemes can be used to construct sensors and actuators that "can not lie", even under the assumption of full software compromise. In a nutshell, a PoX conveys that the remote (and possibly compromised) device really executed specific software, and all execution results are authenticated and cryptographically bound to this execution. This functionality is similar to authenticated outputs that can be produced by software execution in SGX-alike architectures [1] , [2] . However, such architectures are comparatively heavy-weight and unsuitable for low-end MCUs; see Section I-A for further details on targeted devices.
One of our main building blocks in designing PoX schemes is Remote Attestation (RA). Basically, RA is a means to detect malware on a remote low-end MCU. It allows a trusted verifier (Vrf) to remotely measure memory contents (or software state) of an untrusted embedded device (Prv). RA is usually realized as a 2-message challenge-response protocol:
1) Vrf sends an attestation request containing a challenge (Chal) to Prv. It might also contain a token derived from a secret (shared by Vrf and Prv) that allows Prv to authenticate Vrf. 2) Prv receives the attestation request and computes an authenticated integrity check over its memory and Chal. The memory region might be either pre-defined, or explicitly specified in the request. 3) Prv returns the result to Vrf.
4)
Vrf receives the result, and checks whether it corresponds to a valid memory state. The authenticated integrity check is typically realized as a Message Authentication Code (MAC) computed over Prv memory. We overview one concrete RA architecture in Section III.
Despite major progress and several proposals for RA architectures with different assumptions and guarantees [3] - [17] , RA alone is insufficient to obtain proofs of execution. RA allows Vrf to ascertain integrity of software residing in Prv attested memory region. However, RA by itself offers no guarantee that malware is not present elsewhere in Prv memory. It also does not guarantee that the attested software is ever executed or that any such execution completes successfully. Even if the attested software is executed, there is no guarantee that it has not been modified (e.g., by malware residing elsewhere in memory) in time between its execution and its attestation. This phenomenon is well known as the RA Time-Of-Check-Time-Of-Use (TOCTOU) problem. Finally, RA does not guarantee authenticity and integrity of any output produced by the execution of the attested software.
To bridge this gap, we design and implement VAPE: Verified Architecture for Proofs of Execution. In addition to RA, VAPE allows Vrf to request an unforgeable proof that the attested software executed successfully and (optionally) produced certain authenticated output. These guarantees hold even in case of full software compromise on Prv. Our intended contributions are:
-New security service: we design and implement VAPE for unforgeable remote proofs of execution (PoX). VAPE is built on top of VRASED [17] , a formally verified hybrid RA architecture. VAPE overhead vis-a-vis VRASED is small and, to the best of our knowledge, it is the first security architecture for proofs of remote software execution on low-end devices.
-Provable security & implementation verification: We prove that the composition of VRASED with VAPE yields a secure PoX architecture. All security properties expected from VAPE are formally specified using Linear Temporal Logic (LTL) and VAPE modules are verified to adhere to these properties.
-Evaluation, publicly available implementation and applications: VAPE was implemented on a real-world low-end MCU (TI MSP430) and deployed using commodity FPGAs. Its design (along with verification) is publicly available at [18] . Our evaluation demonstrates low hardware overhead, which we consider affordable even for low-end MCUs. The implementation is accompanied by sample PoX application (see Section VII). In particular, we use VAPE to construct trustworthy safety-critical devices. On such a device, even if it is in full software control, malware cannot spoof measurements (or fake performing actuation) without detection.
A. Targeted Devices & Scope
This work focuses on CPS/IoT sensors and actuators with relatively low computing power. These are some of the lowestend devices based on low-power single core MCUs with only a few KBytes of program and data memory. A representative of this class of devices is the Texas Instruments MSP430 MCU family [19] . It has a 16-bit word size, resulting in ≈ 64 KBytes of addressable memory. SRAM is used as data memory and its size ranges between 4 and 16KBytes (depending on the specific MSP430 model), while the rest of the address space is used for program memory, e.g., ROM and Flash. MSP430 is a Von Neumann architecture processor with common data and code address spaces. It has no support of memory management unit (MMU) to perform virtual memory management. Instead, MSP430 accesses memory directly in the physical address. Multiple memory accesses can be performed within a single instruction; its instruction execution time varies from 1 to 6 clock cycles, and instruction length varies from 16 to 48 bits. MSP430 was designed for lowpower and low-cost. It is widely used in many application domains, e.g., automotive industry, utility meters, as well as consumer devices and computer peripherals. Our choice is also motivated by availability of a well-maintained open-source MSP430 hardware design from Open Cores [20] . Nevertheless, our machine model is applicable to other low-end MCUs in the same class as MSP430 (e.g., Atmel AVR ATMega).
B. Organization
Section II discusses related work on remote attestation, formal verification of security services and control flow attestation. Section III provides some background on automated verification, and on VRASED's RA architecture. Section IV introduces Proofs of Execution (PoX), followed by a realization thereof in Section V, including technical details of VAPE design, as well as the adversarial model and assumptions. Section VI presents VAPE's formal verification. Next, in Section VII, we describe how to use VAPE to implement authenticated sensing/actuation. Section VIII concludes the paper with a summary of results.
II. RELATED WORK
Remote Attestation (RA)-architectures can be divided into three categories: hardware-based, software-based, or hybrid. Hardware-based [21] - [23] relies on dedicated secure hardware components, e.g., Trusted Platform Modules (TPMs) [24] . However, the cost of such hardware is normally prohibitive for low-end IoT/CPS devices. Software-based attestation [25] - [27] requires no hardware security features but imposes strong security assumptions about communication between Prv and Vrf, which are unrealistic in the IoT/CPS ecosystem. (Though, it is the only choice for legacy devices). Hybrid RA [8] , [9] , [28] - [30] aims to achieve security equivalent to hardwarebased mechanisms at minimal cost. It thus entails minimal hardware requirements while relying on software to reduce overall complexity and RA footprint on Prv.
The first hybrid RA architecture -SMART [6] -acknowledged the importance of executing code on Prv, in addition to just attesting Prv's memory. Using an attest-then-execute approach, SMART attempted to achieve software execution guarantees by specifying the address of the first instruction to be executed after completion of attestation. We consider this to be a best-effort approach which merely guarantees that the code will start executing. It does not guarantee that execution completes successfully. For example, SMART's approach can not detect if execution is interrupted and never resumed. It also can not detect when a reset (e.g., due to software bugs, or Prv running low on power) happens in the middle of execution, preventing its completion. Furthermore, direct memory access (DMA) may happen during execution and it can modify the code being executed or its output. In other words, SMART offers no guarantees beyond "invoking the executable".
Another notable RA architecture is TrustLite [7] , which builds upon SMART to allow secure interrupts. However, TrustLite does not enforce temporal consistency of attested memory, and is thus conceptually vulnerable to self-relocating malware and memory modification during attestation [31] . Consequently, it is challenging for deriving secure PoX from TrustLite. Several other prominent low-to-medium-end RA architectures -e.g., SANCUS [11] , HYDRA [9] , and TyTaN [8] -do not offer PoX. In this paper, we show that the executethen-attest approach, built on top of a temporally consistent RA architecture, provides unforgeable proofs of execution that are produced only if execution completes successfully. Control Flow Attestation (CFA)-In contrast with RA, which measures Prv's software integrity, CFA techniques [32] - [35] provide Vrf with a measurement of the exact control flow path taken during execution of specific software on Prv. Such a measurement allows Vrf to detect run-time attacks. We believe that it is possible to construct a PoX scheme that relies on CFA to produce proofs of execution based on the attested control flow path. However, in this paper, we advocate a different approach -specific for proofs of execution -for two main reasons:
• CFA requires heavy-weight hardware (e.g., TrustZone in [32] , branch monitor and hash engine in [33] , [35] ) to attest executed instructions in real time, along with memory addresses and the program counter. Such hardware components are not viable for low-end devices, since their cost (in terms of price, size, and energy consumption) is typically higher than the cost of a low-end MCUs itself. For example, the cheapest Trusted Platform Module (TPM) [24] , is about 10× more expensive than MSP430 MCU itself 1 . As discussed in Appendix VIII, current CFA architectures are also more expensive than the MCU [20] itself.
• CFA assumes that Vrf can enumerate a large (potentially exponential) number of valid control flow paths for a given program, and verify a valid response for each. This burden is unnecessary for determining if a proof of execution is valid, because one does not need to know the exact execution path in order to determine if execution occurred (and terminated) successfully.
1 Source: https://www.digikey.com/ Instead of relying on CFA, our work introduces the concepts of ephemeral immutability and ephemeral atomicity. We use them to show how to construct a provably secure PoX architecture. Our VAPE architecture is non-invasive (it does not modify MCU behavior and semantics) and has low hardware overhead (around 2% for registers and 12% for LUTs). Also, Vrf is not required to enumerate valid control flow graphs and the verification burden PoX is exactly the same as the effort to verify a typical RA response for the same code.
Formally Verified Security Services-In recent years, several efforts focused on formally verifying security-critical systems.
In terms of cryptographic primitives, Hawblitzel et al. [36] verified implementations of SHA, HMAC, and RSA. Bond et al. [37] verified an assembly implementation of SHA-256, Poly1305, AES and ECDSA. Zinzindohoué, et al. [38] developed HACL*, a verified cryptographic library containing the entire cryptographic API of NaCl [39] . Larger security-critical systems have also been successfully verified. Bhargavan [40] implemented the TLS protocol with verified cryptographic security. CompCert [41] is a C compiler that is formally verified to preserve C code semantics in generated assembly code. Klein et al. [42] designed and proved functional correctness of the seL4 microkernel. More recently, VRASED [17] realized a verified hybrid RA architecture. VAPE architecture proposed in this paper builds upon VRASED's formally verified properties (see Section III-B for details) and adds additional properties to obtain PoX. Our implementation is also formally verified to guarantee such properties.
III. BACKGROUND
This section provides some background on formal verification and overviews VRASED.
A. Formal Verification, Model Checking & Linear Temporal Logic
Computer-aided formal verification typically involves three basic steps. First, the system of interest (e.g., hardware, software, communication protocol) is described using a formal model, e.g., a Finite State Machine (FSM). Second, properties that the model should satisfy are formally specified. Third, the system model is checked against formally specified properties to guarantee that the system retains them. This can be achieved via either Theorem Proving or Model Checking. In this work, we use the latter to verify the implementation of system modules, and the former to derive new properties from subproperties that were proved for the modules' implementation.
In one instantiation of model checking, properties are specified as formulae using Temporal Logic (TL) and system models are represented as FSMs. Hence, a system is represented by a triple (S, S 0 , T ), where S is a finite set of states, S 0 ⊆ S is the set of possible initial states, and T ⊆ S × S is the transition relation set -it describes the set of states that can be reached in a single step from each state. The use of TL to specify properties allows representation of expected system behavior over time.
We apply the model checker NuSMV [43] , which can be used to verify generic HW or SW models. For digital hardware described at Register Transfer Level (RTL) -which is the case in this work -conversion from Hardware Description Language (HDL) to NuSMV model specification is simple. Furthermore, it can be automated [44] , because the standard RTL design already relies on describing hardware as an FSM.
In NuSMV, properties are specified in Linear Temporal Logic (LTL), which is particularly useful for verifying sequential systems, since LTL extends common logic statements with temporal clauses. In addition to propositional connectives, such as conjunction (∧), disjunction (∨), negation (¬), and implication (→), LTL includes temporal connectives, thus enabling sequential reasoning. In this paper, we are interested in the following temporal connectives:
• Xφ -neXt φ: holds if φ is true at the next system state.
• Fφ -Future φ: holds if there exists a future state where φ is true.
• Gφ -Globally φ: holds if for all future states φ is true.
• φ U ψ -φ Until ψ: holds if there is a future state where ψ holds and φ holds for all states prior to that.
• φ B ψ -φ Before ψ: holds if the existence of state where ψ holds implies the existence of an earlier state where φ holds. This connective can be expressed using U through the equivalence: φ B ψ ≡ ¬(¬φ U ψ). This set of temporal connectives combined with propositional connectives (with their usual meanings) allows us to specify powerful rules. NuSMV works by checking LTL specifications against the system FSM for all reachable states in such FSM.
B. VRASED Architecture
VRASED [17] is a formally verified hybrid (hardware/-software co-design) RA architecture. It was built as a set of sub-modules; each guaranteeing a specific set of subproperties. All VRASED sub-modules, both hardware and software, are individually verified. Finally, the composition of all sub-modules is proved to achieve formal definitions of RA soundness and security. RA soundness guarantees that an integrity-ensuring function (HMAC in VRASED's case) is correctly computed on the memory being attested. Moreover, it guarantees that attested memory can not be modified after the start of RA computation, protecting against "hide-andseek" attacks caused by self-relocating malware [31] . RA security ensures that RA execution generates an unforgeable authenticated memory measurement and that the secret key K used in computing this measurement is not leaked before, during, or after, attestation. Figure 1 illustrates VRASED architecture. To achieve aforementioned goals, VRASED's software (SW-Att in Figure 1 ) is stored in Read-Only Memory (ROM) and relies on a formally verified HMAC implementation from HACL* cryptographic library [38] . A typical execution of SW-Att is carried out as follows:
1) Read challenge Chal from memory region M R.
2) Derive a one-time key from Chal and the attestation master key K.
MCU CORE
MEM. 3) Generate an attestation token H by computing an HMAC over an attested memory region AR using the derived key: • irq: Signal that indicates if an interrupt is happening (1-bit); These signals are used to determine a one-bit reset signal output, that, when set to 1, triggers an immediate system-wide MCU reset, i.e., before execution of the next instruction. The reset output is triggered when VRASED's hardware detects any violation of security properties. VRASED's hardware is described in Register Transfer Level (RTL) using Finite State Machines (FSMs). Then, NuSMV Model Checker [45] is used to automatically prove that such FSMs achieve claimed security sub-properties. Finally, the proof that the conjunction of hardware and software sub-properties implies end-to-end soundness and security is done using an LTL theorem prover. More formally, VRASED end-to-end security proof guarantees that no probabilistic polynomial time (ppt) adversary can win Definition 1. VRASED's Security Game [17] 1.1 RA Security Game (RA-game): Notation: -l is the security parameter and |K| = |Chal| = |M R| = l -AR(t) denotes the content of AR at time t RA-game: 1) Setup: Adv is given oracle access to SW-Att calls.
BACK-BONE
2) Challenge: A random challenge Chal ← ${0, 1} l is generated and given to Adv. 3) Response: Adv responds with a pair (M, σ), where σ is either forged by Adv, or is the result of calling SW-Att at some arbitrary time t. 4) Adv wins if and only if M = AR(t) and σ = HM AC(KDF (K, Chal), M ). 1.2 RA Security Definition: An RA scheme is considered secure if for all PPT adversaries Adv, there exists a negligible function negl such that:
the security game in Definition 1 with more than negligible probability in the security parameter.
IV. PROOF OF EXECUTION (PoX) SCHEMES
A Proof of Execution (PoX) is a scheme 2 involving two parties: (1) a trusted verifier Vrf, and (2) an untrusted (potentially infected) prover Prv. Informally, the goal of PoX is to allow Vrf to request the execution of specific software S by Prv. As a part of PoX, Prv must reply to Vrf with an authenticated unforgeable cryptographic proof (H) that convinces Vrf that Prv indeed executed S. To accomplish this, H must prove that: (1) S executed atomically, in its entirety, and that such execution occurred on Prv (and not on some other device); and (2) any claimed result/output value of such execution, that is accepted as legitimate by Vrf, could not have been spoofed or modified. In addition, the size and behavior (i.e., instructions) of S, as well as the size of its output (if any), should be configurable and optionally specified by Vrf. In other words, PoX should provide proofs of execution for arbitrary software, along with corresponding authenticated outputs. Definition 2 specifies PoX schemes in more detail.
We now justify the need to include atomic execution of S in the definition of PoX. On low-end MCUs, software typically runs on "bare metal" and, in most cases, there is no mechanism to enforce memory isolation between applications. Therefore, allowing S's execution to be interrupted would permit other (potentially malicious) software running on Prv to alter the behavior of S. This might be done, for example, by an application that interrupts execution of S and changes intermediate computation results in S data memory, thus tampering with its output or control flow. Another example is an interrupt that resumes S at different instruction modifying S's execution flow. Such an action could modify S behavior completely via return oriented programming (ROP).
A. PoX Adversarial Model & Security Definition
We consider an adversary, Adv, that might control Prv's entire software state, code, and data. Adv can modify any writable memory and read any memory that is not explicitly protected by (hardware-enforced) access control rules, i.e., it can read anything (including secrets) that is not explicitly protected by the "trusted" hardware Adv may also have full control over all Direct Memory Access (DMA) controllers on Prv. DMA allows a hardware controller to directly access main memory (e.g., RAM, flash or ROM) without going through the CPU.
We consider a scheme PoX = (XRequest, XAtomicExec, XProve, XVerify) to be secure if the aforementioned Adv has negligible probability of convincing Vrf that S executed successfully when, in reality, such execution did not take place, or if it was interrupted. In addition we require that, if execution of S takes place, Adv can not tamper with, or influence, this execution's outputs. These notions are formalized by the security game in Definition 3.
We note that Definition 3 binds execution of S to the time between Vrf issuing the request and receiving the response. Therefore, if a PoX scheme is secure according to this definition, Vrf can be certain about freshness of the execution. In the same vein, the output produced by such execution is also guaranteed to be fresh. This timeliness property is important to avoid replays of previous valid executions; in fact, it is essential for safety-critical applications. (See Section VII for examples). Physical Attacks: physical and hardware attacks are out of scope in this paper. Specifically, Adv cannot modify the code in ROM, induce hardware faults, or retrieve Prv secrets via physical presence side-channels. Protection against such attacks is considered orthogonal and could be supported via standard physical-security techniques [46] .
V. VAPE: A SECURE PoX ARCHITECTURE
We now present VAPE, a PoX architecture that realizes our PoX security definition presented in Definition 3. One key aspect of VAPE is a computer-aided formally verified and publicly available implementation thereof. This section first provides some intuition behind VAPE's design. All VAPE properties, which are overviewed informally in this section, are formalized in Section VI.
In the rest of this section we use the term "unprivileged software" to refer to any software other than SW-Att code from VRASED. Adv is allowed to overwrite or bypass any "unprivileged software". Meanwhile, "trusted software" refers to VRASED's implementation of SW-Att (see Section III for details) which is formally verified and cannot be modified by Adv, since it is stored in ROM. VAPE is designed such that no changes to SW-Att are required. Therefore, both functionalities (RA and PoX, i.e., VRASED and VAPE) can coexist on the same device without interfering with each other. Notation is summarized in Table I .
Definition 2 (Proof of Execution (PoX) Scheme).
A Proof of Execution (PoX) scheme is a tuple of algorithms [XRequest, XAtomicExec, XProve, XVerify] performed between Prv and Vrf where:
1) XRequest
Vrf→Prv (S, ·): is an algorithm executed by Vrf which takes as input some software S (consisting of a list of instructions {s 1 , s 2 , ..., sm}). Vrf expects an honest Prv to execute S. XRequest generates a challenge Chal, and embeds it alongside S, into an output request message asking Prv to execute S, and to prove that such execution took place.
2) XAtomicExec
Prv (ER, ·): an algorithm (with possible hardware-support) that takes as input some executable region ER in Prv's memory, containing a list of instructions {i 1 , i 2 , ..., im}. XAtomicExec runs on Prv and is considered successful iff: (1) instructions in ER are executed from its first instruction, i 1 , and end at its last instruction, im; (2) ER's execution is atomic, i.e., if E is the sequence of instructions executed between i 1 and im, then {e|e ∈ E} ⊆ ER; and (3) ER's execution flow is not altered by external events, i.e., MCU interrupts or DMA events. The XAtomicExec algorithm outputs a string O. Note that O may be a default string (⊥) if ER's execution does not result in any output.
3) XProve
Prv (ER, Chal, O, ·): an algorithm (with possible hardware-support) that takes as input some ER, Chal and O and is run by Prv to output H, i.e., a proof that XRequest Vrf→Prv (S, ·) and XAtomicExec Prv (ER, ·) happened (in this sequence) and that O was produced by XAtomicExec Prv (ER, ·).
4) XVerify
Prv→Vrf (H, O, S, Chal, ·): an algorithm executed by Vrf with the following inputs: some S, Chal, H and O. The XVerify algorithm checks whether H is a valid proof of the execution of S (i.e., executed memory region ER corresponds to S) on Prv given the challenge Chal, and if O is an authentic output/result of such an execution. If both checks succeed, XVerify outputs 1, otherwise it outputs 0.
Remark: In the parameters list, (·) denotes that additional parameters might be included depending on the specific PoX construction.
Fig. 2. Definition of Proof of Execution (PoX) Scheme
Definition 3 (PoX Security Game).
-Let treq denote time when Vrf issues Chal ← XRequest Vrf→Prv (S). -Let t verif denote time when Vrf receives H and O back from Prv in response to XRequest
Vrf→Prv .
-Let XAtomicExec
Prv (S, treq → t verif ) denote that XAtomicExec Prv (ER, ·), such that ER ≡ S, was invoked and completed within the time 
PoX Security Definition:
A PoX scheme is considered secure for security parameter l if, for all PPT adversaries Adv, there exists a negligible function negl such that:
Fig. 3. Definition of PoX Security Game
A. Protocol and Architecture VAPE implements a secure PoX = (XRequest, XAtomicExec, XProve, XVerify) scheme conforming to Definition 4. The steps in VAPE workflow are illustrated in Figure 5 . The main idea is to first execute code contained in ER. Then, at some later time, VAPE invokes VRASED verified RA functionality to attest the code in ER and include, in the attestation result, additional information that allows Vrf to verify that ER code actually executed. If ER execution produces an output (e.g., Prv is a sensor running ER's code to obtain some physical/ambient quantity), authenticity and integrity of this output can also be verified. These are achieved by including the EXEC flag among inputs to HMAC computed as part of VRASED RA. The value of this flag is controlled by VAPE formally verified hardware and its memory can not be written by any software running on Prv. VAPE hardware module runs in parallel with the MCU monitoring its behavior and deciding the value of EXEC accordingly. Figure 6 depicts VAPE's architecture. In addition to VRASED hardware that provides secure RA by monitoring a set of CPU signals (see Section III-B for details), VAPE also monitors values stored in the dedicated physical memory region -M ET ADAT A. M ET ADAT A contains addresses/pointers to memory boundaries of ER (i.e., ER min and ER max ) and memory boundaries of expected output: OR min and OR max . These addresses are sent by Vrf as part of XRequest, and are configurable at run-time. The code S to Definition 4 (Proof of Execution Protocol). VAPE instantiates a PoX = (XRequest, XAtomicExec, XProve, XVerify) scheme behaving as follows:
1) XRequest
Vrf→Prv (S, ER min , ERmax, OR min , ORmax): includes a set of configuration parameters ER min , ERmax, OR min , ORmax. The Executable Range (ER) is a contiguous memory block in which S is to be installed: ER = [ER min , ERmax]. Similarly, the Output Range (OR) is also configurable and defined by Vrf's request as OR = [OR min , ORmax]. If S does not produce any output OR min = ORmax =⊥. S is the software to be installed in ER and executed. If S is unspecified (S =⊥) the protocol will execute whatever code was pre-installed on ER on Prv, i.e., Vrf is not required to provide S in every request, only when it wants to update ER contents before executing it. If the code for S is sent by Vrf, untrusted auxiliary software in Prv is responsible for copying S into ER. Prv also receives a random l-bit challenge Chal (|Chal| = l) as part of the request, where l is the security parameter.
2) XAtomicExec
Prv (ER, OR, M ET ADAT A): This algorithm starts with unprivileged auxiliary software writing the values of: ER min , ERmax, OR min , ORmax and Chal to a special pre-defined memory region denoted by M ET ADAT A. VAPE's verified hardware enforces immutability, atomic execution and access control rules according to the values stored in M ET ADAT A; details are described in Section V-A. Finally, it begins execution of S by setting the program counter to the value of ER min .
3) XProve
Prv (ER, Chal, OR): produces proof of execution H. H allows Vrf to decide whether: (1) code contained in ER actually executed; (2) ER contained specified (expected) S's code during execution; (3) this execution is fresh, i.e., performed after the most recent XRequest; and (4) claimed output in OR is indeed produced by this execution. As mentioned earlier, VAPE uses VRASED's RA architecture to compute H by attesting at least the executable, along with its output, and corresponding execution metadata. More formally:
M ET ADAT A also contains the EXEC flag that is read-only to all software running in Prv and can only be written to by VAPE's formally verified hardware. This hardware monitors execution and sets EXEC = 1 only if ER executed successfully (XAtomicExec) and memory regions of M ET ADAT A, ER, and OR were not modified between the end of ER's execution and the computation of H. The reasons for these requirements are detailed in Section V-C. If any malware residing on Prv attempts to violate any of these properties VAPE's verified hardware (provably) sets EXEC to zero. After computing H, Prv returns it and contents of OR (O) produced by ER's execution to Vrf.
4) XVerify
Prv→Vrf (H, O, S, M ET ADAT A Vrf ) : Upon receiving H and O, Vrf checks whether H is produced by a legitimate execution of S and reflects parameters specified in XRequest, i.e., M ET ADAT A Vrf = Chal||OR min ||ORmax||ER min ||ERmax||EXEC = 1. This way, Vrf concludes that S successfully executed on Prv and produced output O if: be stored in ER is optionally 3 sent by Vrf. M ET ADAT A includes the EXEC flag, which is initialized to 0 and only changes from 0 to 1 (by VAPE's hardware) when ER execution starts, i.e., when the PC points to ER min . Afterwards, any violation of VAPE's security properties (detailed in Section V-C) immediately changes EXEC back to 0. After a violation, the only way to set the flag back to 1 is to re-start execution of ER from the very beginning, i.e., with PC=ER min . In other words, VAPE verified hardware makes sure that EXEC value covered by In addition to EXEC, HMAC covers a set of parameters (in M ET ADAT A memory region) that allows Vrf to check whether executed software was indeed located in ER = [ER min , ER max ]. If any output is expected, Vrf specifies a memory range OR = [OR min , OR max ] for storing output. Contents of OR are also covered by the computed HMAC, allowing Vrf to verify authenticity of the output of the execution. VAPE protocol is presented in Definition 4.
Remark: Our notion of successful execution requires S to have a single exit point -ER max . Any selfcontained code with multiple legal exits can be trivially instrumented to have a single exit pointas, as follows: Replace each exit instruction with a jump to the unified exit point ERmax. This notion also requires S to run atomically. Since this constraint might be undesirable in some real-time systems, we discuss how it can be relaxed in Appendix VIII-C. Finally, Vrf is responsible for defining OR memory region according to the behavior of S. OR should be large enough to fit all output produced by S and OR boundaries should correspond to addresses where S writes the output values to be sent to Vrf.
B. MCU Assumptions
As mentioned in section V-A, VAPE extends VRASED to enable a verified architecture for proofs of execution. Therefore, we assume the same machine model introduced in VRASED and make no additional assumptions. We review these assumptions throughout the rest of this section and then formalize them as an LTL machine model in Section VI.
Verification of the entire CPU is beyond the scope of this paper. Therefore, we assume the CPU architecture strictly adheres to, and correctly implements, its specifications. In particular, our design and verification rely on the following simple axioms: A1 -Program Counter (PC): P C always contains the address of the instruction being executed in a given CPU cycle. A2 -Memory Address: Whenever memory is read or written, a data-address signal (D addr ) contains the address of the corresponding memory location. For a read access, a data read-enable bit (R en ) must be set, while, for a write access, a data write-enable bit (W en ) must be set. A3 -DMA: Whenever the DMA controller attempts to access the main system memory, a DMA-address signal (DM A addr ) reflects the address of the memory location being accessed and a DMA-enable bit (DM A en ) must be set. DMA can not access memory when DM A en is off (logical zero). A4 -MCU Reset: At the end of a successful reset routine, all registers (including P C) are set to zero before resuming normal software execution flow. Resets are handled by the MCU in hardware. Thus, the reset handling routine can not be modified. When a reset happens, the corresponding reset signal is set. The reset signal is also set when the MCU initializes for the first time. A5 -Interrupts: When interrupts happen, the corresponding irq signal is set.
C. VAPE's Sub-Properties at a High-Level
We now describe the sub-properties enforced by VAPE. Section VI formalizes these sub-properties in LTL and provides a single end-to-end definition for VAPE's correctness. This end-to-end correctness notion is provably implied by the composition of all sub-properties. The sub-properties fall into two major groups: Execution Protection and Metadata Protection. A violation of any of these properties implies one or more of the following:
• Code in ER was not executed atomically and in its entirety; • Output in OR was not produced by ER execution; • Code in ER was not executed in a timely manner, i.e., after receiving the latest XRequest. Therefore, whenever VAPE detects any violation, EXEC is set to 0. Then, since EXEC is included among the inputs to the computation of HMAC (conveyed in Prv's response), it will be interpreted by Vrf as failure to prove execution of code in ER. The appropriate response to Vrf's challenge must be unforgeable and non-invertible. This implies that, in the XProve routine, K used to compute HMAC must never be leaked (with non-negligible probability) and HMAC implementation must be functionally correct (adhere to its cryptographic specification). Moreover, contents of memory being attested must not change during the HMAC computation. We rely on VRASED verified RA architecture to ensure these properties. Also, to ensure trustworthiness of the response, VAPE guarantees that no software in Prv can ever modify EXEC flag and that, once EXEC = 0, it can only become 1 again if ER's execution re-starts completely. P6 -Challenge Temporal Consistency: VAPE must ensure that Chal can not be modified between ER's execution and HMAC computation in XProve. Without this property, the following attack is possible: (1) Prv-resident malware first executes ER properly (i.e., by not violating P1-P5), resulting in EXEC = 1 after execution stops, and (2) at some later time, malware receives Chal from Vrf and simply calls XProve on this Chal without executing ER. As a result, malware would acquire a valid proof of execution (since EXEC remains 1 when the proof is generated) even though no ER execution occurred before Chal was received. Such attacks can be prevented by setting EXEC = 0 whenever the memory region storing Chal is modified.
D. Formal Verification Methodology
Our formal verification approach starts by formalizing RA sub-properties discussed in this section using Linear Temporal Logic (LTL) to define invariants that must hold throughout the entire execution. We then use a theorem prover [47] to write a computer-aided proof that the conjunction of the LTL sub-properties imply an end-to-end formal definition for the guarantee expected from VAPE hardware. VAPE correctness, when properly composed with VRASED guarantees, yields a PoX scheme secure according to Definition 3. This is proved by showing that, if the composition between the two is implemented as described in Definition 4, VRASED security can be reduced to VAPE security. For more details see Section VI.
VAPE hardware module is composed of several sub-modules written in the Verilog Hardware Description Language (HDL). Each sub-module is responsible for enforcing a set of LTL sub-properties and is described as an FSM in: (1) Verilog at Register Transfer Level (RTL); and (2) the Model-Checking language SMV [43] . We then use the NuSMV model checker to verify that the FSM complies with LTL specifications. If verification fails, the sub-module is re-designed.
Once each sub-module is verified, they are combined into a single Verilog design. The composition is converted to SMV using the automatic translation tool Verilog2SMV [44] . The resulting SMV is simultaneously verified against all LTL specifications to prove that the final Verilog design for HW-Mod complies with all necessary properties. Automatic conversion of the composition of HW-Mod from Verilog to SMV rules out the possibility of human mistakes in representing Verilog FSMs as SMV.
VI. FORMAL SPECIFICATION & VERIFIED IMPLEMENTATION
We now describe VAPE formally verified implementation. We start by defining a generic machine model for low-end embedded systems composed of a subset of VRASED machine model and expressed in LTL. Then, we formally state the endgoal of VAPE implementation. Next, we prove that a set of LTL sub-properties, corresponding to the formal specification of P1-P6, when applied to this machine model, implies VAPE end goal. Finally, we implement VAPE hardware by applying the methodology described in Section V-D to verify that the implementation conforms to all LTL sub-properties, thus implying VAPE end goal.
A. Machine Model
Definition 5 models the behavior of low-end MCUs, as described in Section I-A. It consists of a subset of the machine model introduced by VRASED. Nonetheless, this subset models all MCU behavior that is relevant for stating and verifying the correctness of VAPE's implementation.
2) Interrupt → irq
3) M R, CR, AR, KR, XS, and M ET ADAT A are nonoverlapping memory regions
Modify _ Mem models that a given memory address can be modified in two cases: by a CPU instruction or by DMA. In the former, W en signal must be on and D addr must contain the memory address being accessed. In the second case, DM A en signal must be on and DM A addr must contain the address being modified by DMA. The requirements for reading from a given address are similar, except that instead of W en , R en must be on. We do not explicitly state this behavior since it is not used in VAPE proofs. For the same reason, modeling the effects of instructions that only modify register values (e.g., ALU operations, such as add and mul) is also not necessary. The machine model also captures the fact that, when an interrupt happens during execution, the irq signal in MCU hardware is set to 1.
With respect to memory layout, the model states that M R, CR, AR, KR, XS, and M ET ADAT A are disjoint memory regions. The first five are memory regions are defined in VRASED, as shown in Figure 1 . As shown in Figure 6 , M ET ADAT A is a fixed memory region used by VAPE to store information about software execution status.
B. Security & Implementation Correctness
Our strategy in proving that VAPE is a secure PoX architecture (according to Definition 3) is two-part:
[A]: We show that properties P1-P6, discussed in Section V-C and formally specified next in Section VI-C, are sufficient to guarantee that EXEC flag is 1 if and only if S indeed executed on Prv. To show this, we compose a computer proof using SPOT LTL proof assistant [47] .
[B]: We use cryptographic reduction proofs to show that, as long as part A holds, VRASED security with respect to Definition 1 can be reduced to VAPE's PoX security from Definition 3. In turn, HMAC's existential unforgeability can be reduced to VRASED's security [17] . Therefore, both VAPE and VRASED rely on the assumption that HMAC is a secure MAC. In the rest of this section, we convey the intuition behind both of these steps. Proof details are in Appendix A.
The goal of part A above is to show that VAPE's subproperties imply Definition 6. LTL specification in Definition 6 captures the conditions that must hold in order for EXEC to be set to 1 during execution of XProve, enabling generation of a valid proof of execution. This specification ensures that, in order to have EXEC = 1 during execution of XProve (i.e, for [EXEC ∧ P C ∈ CR] to hold), at least once before such time the following must have happened:
1) The system reached state S 0 where software stored in ER started executing from its first instruction (P C = ER min ). 2) The system eventually reached a state S 1 when ER finished executing (P C = ER max ). In the interval between S 0 and S 1 P C kept executing instructions within ER, there were no interrupts, no resets, and DMA remained inactive.
3) The system eventually reached a state S 2 when XProve started executing (P C = CR min ). In the interval between S 0 and S 2 , M ET ADAT A and ER regions were not modified. 4) In the interval between S 0 and S 2 , OR region was only modified by ER's execution, i.e., P C ∈ ER ∨ ¬ Modify_Mem(OR). Figure 7 shows the time windows wherein each memory region must not change during VAPE's PoX as implied by VAPE's correctness (Definition 6). Violating any of these conditions will cause EXEC have value 0 during XProve's computation. Consequently, any violation will result in Vrf rejecting the proof of execution since it will not conform to the expected value of H, per Equation 2 in Definition 4.
The intuition behind the cryptographic reduction (part B of our proof strategy) is that computing token consists simply of invoking VRASED SW-Att with M R = Chal, ER ∈ AR, OR ∈ AR, and M ET ADAT A ∈ AR. Therefore, a successful forgery of VAPE's H implies breaking VRASED security. Since H always includes the value of EXEC, this implies that VAPE is PoX-secure (Definition 3). The complete reduction is presented in Appendix A.
C. VAPE's Sub-Properties in LTL
We now introduce formal definitions for the necessary subproperties enforced by VAPE as LTL specifications 3-12 in Definition 7. We describe how they map to high-level notions P1-P6 discussed in Section V-C. Appendix A discusses a computer proof that conjunction of this set of properties is sufficient to satisfy a formal definition of VAPE correctness from Definition 6. Then, Section VI-D shows examples of VAPE hardware sub-modules, designed as FSMs and verified to enforce properties in Definition 7.
LTL 3 enforces P1 -Ephemeral immutability by making sure that whenever ER memory region is written, either by CPU or DMA, EXEC is immediately set to logical 0 (false).
P2 -Ephemeral Atomicity is enforced by a set of three LTL specifications. LTL 4 enforces that the only way for ER's execution to terminate, without setting EXEC to logical 0, is through its last instruction: P C = ER max . This is specified by checking the relation between current and next P C values using LTL neXt operator. In particular, if current P C value is Definition 6. Formal specification of VAPE's correctness.
Definition 7. Necessary Sub-Properties for Secure Proofs of Execution in LTL.
Ephemeral Immutability:
Ephemeral Atomicity:
Output Protection:
Executable/Output (ER/OR) Boundaries & Challenge Temporal Consistency:
Remark: Note that Chalmem ∈ M ET ADAT A.
Response Protection: Fig. 7 . Illustration of time intervals that each memory region must remain unchanged in order to produce a valid H (EXEC = 1). t(X) denotes the time when P C = X.
within ER, and next P C value is outside SW-Att region, then either current P C value is the address of ER max , or EXEC is set to 0 in the next cycle. Also, LTL 5 enforces that the only way for P C to enter ER is through the very first instruction: ER min . This prevents ER execution from starting at some point in the middle of ER, thus making sure that ER always executes in its entirety. Finally, LTL 6 enforces that EXEC is set to zero if an interrupt happens in the middle of ER execution. Even though LTLs 4 and 5 already enforce that PC can not change to anywhere outside ER, interrupts could be programmed to return to an arbitrary instruction within ER. Although this would not violate LTLs 4 and 5, it would still modify ER's behavior. Therefore, LTL 6 is needed to prevent that. P3 -Output Protection is enforced by LTL 7 by making sure that: (1) DMA controller does not write into OR; (2) CPU can only modify OR when executing instructions within ER; and 3) DMA can not be active during ER execution; otherwise, a compromised DMA could change intermediate
Similar to P3, P4 -Executable/Output Boundaries and P6 -Challenge Temporal Consistency are enforced by LTL 10. Since Chal as well as ER min , ER max , OR min , and OR max are all stored in M ET ADAT A reserved memory region, it suffices to ensure that EXEC is set to logical 0 whenever this region is modified. Also, LTL 8 enforces that EXEC is only set to one if ER and OR are configured (by M ET ADAT A values ER min , ER max , OR min , OR max ) as valid memory regions.
Finally, LTLs 11, and 12 (in addition to VRASED verified RA architecture) are responsible for ensuring P5-Response Protection by making sure that EXEC always reflects what is intended by VAPE hardware. LTL 7 specifies that the only way to change EXEC from 0 to 1 is by starting ER's execution over. Finally, LTL 12 states that, whenever a reset happens (this also includes the system initial booting state) and execution is initialized, the initial value of EXEC is 0.
To conclude, we recall that no software running on Prv can modify EXEC. Therefore, it is not possible for malware to change it directly.
D. Formally Verified Modules
VAPE is designed as a set of seven sub-modules. We now describe VAPE's verified implementation, by focusing on two of these sub-modules and their corresponding properties. The Verilog implementation of omitted sub-modules is available in [18] . Each sub-module enforces a sub-set of LTLs specification in Definition 7. As discussed in Section III-A, submodules are designed as FSMs. In particular, we implement them as Mealy FSMs, i.e, their output changes as a function of both the current state and current input values. Each FSM takes as input a subset of signals shown in Figure 6 and produces only one output -EXEC -indicating violation of PoX properties.
To simplify the presentation, we do not explicitly represent the value of EXEC for each state transition. Instead, we define the following implicit representation: 1) EXEC is 0 whenever an FSM transitions to N otExec state; 2) EXEC remains 0 until a transition leaving N otExec state is triggered; 3) EXEC is 1 in all other states. 4) Sub-modules composition: Since all PoX properties must simultaneously hold, the value of EXEC produced by VAPE is the conjunction (logical AN D) of all submodules' individual EXEC flags. Figure 8 represents a verified model enforcing LTLs 4-6, corresponding to the high-level property P2-Ephemeral Atomicity . The FSM consists of five states. notER and midER represent states when P C is: (1) outside ER, and (2) within ER respectively, excluding the first (ER min ) and last (ER max ) instructions. Meanwhile, f stER and lstER correspond to states when P C points to the first and last instructions, respectively. The only possible path from notER Run N otExec otherwise otherwise to midER is through f stER. Similarly, the only path from midER to notER is through lstER. A transition to the N otExec state is triggered whenever: (1) any sequence of values for P C do not follow the aforementioned conditions, or (2) irq is logical 1 while P C is inside ER. Lastly, the only way to transition out of the N otExec state is to restart ER's execution. Figure 9 shows the FSM verified to comply with LTL 10 (P6-Challenge Temporal Consistency). The FSM has two states: Run and N otExec. The FSM transitions to the N otExec state and outputs EXEC = 0 whenever a violation happens, i.e., whenever M ET ADAT A is modified in software. It transitions back to Run when ER's execution is restarted without such violation.
E. Evaluation
VAPE incurs modest hardware overhead, compared to the VRASED baseline: ≈ 2% for registers and 12% for LUTs. The runtime to produce a proof of S execution depends on the size of S, which determines VRASED's attestation runtime used to produce H. In the most expensive or extreme case, when the entire program memory (8 kB) is occupied by ER + OR, this computation takes around 900ms on the 8MHz MSP430. Due to space limitations, a more detailed evaluation is deferred to Appendix VIII.
VII. AUTHENTICATED SENSING/ACTUATION
As discussed in Section I an important functionality that can be realized using PoX is authenticated sensing/actuation. In this section, we demonstrate how VAPE can be used to build sensors and actuators that "can not lie".
As a running example we use a fire sensor: a safety critical low-end embedded device commonly present in households and workplaces. Such a device consists of an MCU equipped with analog hardware capable of measuring physical/chemical quantities, e.g., temperature, humidity, and CO 2 level. It is also usually equipped with actuation-capable analog hardware, such as a buzzer. Analog hardware components are directly connected to MCU General Purpose Input/Output (GPIO) ports. GPIO ports are physical wires directly mapped to fixed memory locations in MCU memory. Therefore, software running on the MCU can read the physical quantities directly from GPIO memory.
In this example, we consider that MCU's software periodically reads these values, and transmits them to a remote safety authority, e.g., a fire department. The safety authority then decides to take action. The MCU also triggers the buzzer actuator whenever sensed values indicate a fire. Given the safety-critical nature of this application, it is important for the safety authority to be sure that reported values are authentic and were produced by execution of expected software. Otherwise, malware could spoof such values (e.g., by not reading them from the proper GPIO). PoX can guarantee that reported values were read from the correct GPIO port (since the memory address is specified by the instructions in the ER executable), and that the produced output (stored in OR) was indeed generated by execution of ER and was not modified thereafter. Thus, upon receiving sensed values accompanied by a proof of execution, the safety authority can be sure that the reported sensed value can be trusted.
As a proof of concept, we use VAPE to implement a simple fire sensor that operates with temperature and humidity quantities. It communicates with a remote Vrf (e.g., fire department) using a low-power ZigBee radio 4 typically used by low-end CPS/IoT devices. Temperature and humidity analog devices are connected to a VAPE-enabled MSP430 MCU running at 8MHz and synthesized using a Basys3 Artix-7 FPGA board. As shown in Figure 10 , MCU GPIO ports connected to the temperature/humidity sensor and to the buzzer.
VAPE is used to prove execution of fire sensor software. This software is shown in Figure 12a in Appendix VIII-C. The software consists of two main functions: ReadSensor and SoundAlarm. Proofs of execution are requested by the safety authority via an XRequest to issue commands to execute these functions. ReadSensor reads and processes the value generated temperature/humidity analog device memory-mapped GPIO, and copies this value to OR. The SoundAlarm function turns the buzzer on for 2 seconds, i.e., it writes "1" to the memory address mapped to the buzzer, busy-waits for 2 seconds, and then writes "0" to the same 4 https://www.zigbee.org/ Fig. 10 . Hardware setup for a fire sensor memory. This implementation corresponds to the one in the open-source repository 5 and was ported to a VAPE-enabled MCU. Our porting effort was minimal; it involved around 30 additional lines of C code, mainly for re-implementing subfunctions that are originally implemented as shared APIs, e.g., digitalRead/Write. Finally, we transform ported code to be compatible with VAPE's PoX architecture. Details can be found in Appendix VIII-C.
VIII. CONCLUSION
This paper introduces VAPE, a novel and formally verified security service targeting low-end embedded devices. It allows a remote untrusted prover to generate unforgeable proofs of remote software execution. We envision VAPE's use in many IoT application domains, such as authenticated sensing and actuation. Our implementation of VAPE is realized on a real embedded system platform, MSP430, synthesized on an FPGA, and the verified implementation is publicly available. Our evaluation shows that VAPE has low overhead for both hardware footprint, time for generating proofs of execution. We believe that this work is especially relevant to safetycritical environments and applications.
APPENDIX A: PROOFS FOR IMPLEMENTATION CORRECTNESS & SECURITY
In this section we discuss the computer proof for VAPE's implementation correctness (Theorem 1) and the reduction proof that VAPE is a secure PoX architecture as long as VRASED is a secure RA architecture (Theorem 2). A formal LTL computer proof for Theorem 1 is available at [18] . We here discuss the intuition behind such proof. Theorem 1 states that LTLs 3 -12, when considered in conjunction with the machine model in Definition 5, imply VAPE's implementation correctness.
Recall that Definition 6 states that, in order to have EXEC = 1 during the computation of XProve, at least once before such time the following must have happened:
1) The system reached state S 0 in which the software stored in ER started executing from its first instruction (P C = ER min ). 2) The system eventually reached a state S 1 when ER finished executing (P C = ER max ). In the interval between S 0 and S 1 P C remained executing instructions within ER, there were no interrupts, no resets, and DMA remained inactive.
3) The system eventually reached a state S 2 when XProve started executing (P C = CR min ). In the interval between S 0 and S 2 the memory regions of M ET ADAT A and ER were not modified. 4) In the interval between S 0 and S 2 the OR memory region was only modified by ER's software execution (P C ∈ ER ∨ ¬ Modify_Mem(OR)). The first two properties to be noted are LTL 12 and LTL 11. LTL 12 establishes the default state of EXEC is 0. LTL 11 enforces that the only possible way to change EXEC from 0 to 1 is by having P C = ER min . In other words, EXEC is 1 during the computation of XProve only if, at some before that, the code stored in ER started to execute (state S 0 ).
To see why state S 1 (when ER execution finishes, i.e., P C = ER max ) is reached and until then ER executes atomically, we look at LTLs 4, 5, 6, and 9. LTLs 4, 5 and 6 enforce that P C will stay inside ER until S 1 or otherwise EXEC will be set to 0. On the other hand, it is impossible to execute instructions of XProve (P C ∈ CR) without leaving ER, because LTL 9 guarantees that ER and CR do not overlap, or EXEC = 0.
So far we have argued that to have a token H that reflects EXEC = 1 the code contained in ER must have executed successfully. What remains to be shown is: producing this token implies the code in ER and M ET ADAT A are not modified in the interval between S 0 and S 2 and only ER's execution can modify OR in the same time interval.
Clearly, the contents of ER can not be modified after S 0 because Modify_Mem(ER) directly implies that LTL 3 will set EXEC = 0. The same reasoning is applicable for modifications to M ET ADAT A region with respect to LTL 10. The same argument applies to modifying OR, with the only exception that OR modifications are allowed only by the CPU and when P C ∈ ER (LTL 7). This means that OR can only be modified by the execution of ER. In addition, LTL 7 also ensures that DMA is disabled during the execution of ER to prevent unauthorized modification of intermediate results in data memory. Therefore, the timeline presented in Figure 7 is strictly implied by VAPE's implementation. This concludes the reasoning behind Theorem 1.
Theorem 2. VAPE is secure according to Definition 3 as long as VRASED is a secure RA architecture according to Definition 1.
Proof. Assume that Adv PoX is an adversary capable of winning the security game in Definition 3 against VAPE with more than negligible probability. We show that, if such Adv PoX exists, then it can be used to construct (in a polynomial number of steps) Adv RA that wins VRASED's security game (Definition 1) with more than negligible probability. Therefore, by contradiction, inexistence of Adv RA (i.e., VRASED's security) implies inexistence of Adv PoX (VAPE's security).
First we recall that to win VAPE's security game 
where ER Adv is a memory region different from the one specified by Vrf on XRequest (Adv PoX can do this by modifying M ET ADAT A to different values of ERmin and ERmax before calling XAtomicExec). Case1.3 Similar to Case1.2, but ER Adv is the same region specified by Vrf on XRequest containing a different executable S Adv . We show that an adversary that succeeds in any of these cases can be used win VRASED's security game. To see why this is the case, we note that VAPE's XProve function is implemented by using VRASED's SW-Att without any modification. SW-Att covers memory regions M R (challenge memory) and AR (attested region). Hence, VAPE instantiates these memory regions as:
Doing so ensures that all sensitive memory regions used by VAPE are included among the inputs to VRASED's attestation. Let X(t) denote the content in memory region X at time t. Adv RA can then be constructed using Adv PoX as follows: 1) Adv RA receives Chal from the challenger in step (2) [20] as its open core implementation. We extended VRASED to implement the hardware architecture presented in Figure 6 . In addition to the VAPE module in HW-Mod, we added another peripheral module responsible for storing and maintaining VAPE's M ET ADAT A. As a peripheral, the content in M ET ADAT A can be accessed in a pre-defined memory address via the standard peripheral memory access. We also ensure that EXEC (which is located inside M ET ADAT A) is unmodifiable in software by removing software-write wires in hardware. Finally, we use Xilinx Vivado to synthesize an RTL description of the modified HW-Mod and deploy it on the Artix-7 FPGA class.
B. Overhead
Register Look-up Table ( [20] and VRASED [17] . VAPE's hardware overhead is small compared to the baseline VRASED; it requires 2% and 12% additional registers and LUTs, respectively. In absolute numbers, it adds 44 registers and 302 LUTs to the underlying MCU. In terms of memory, VAPE requires 9 additional bytes of RAM for storing M ET ADAT A. This overhead corresponds to 0.01% of MSP430 16-bit address space. Run-time. We do not observe any overhead for software's execution time on the VAPE-enabled Prv. This is because VAPE does not introduce new instructions or modifications to the MSP430 ISA; VAPE's hardware runs in parallel with the original MSP430 CPU. Run-time to produce a proof of S's execution comprise: (1) time to execute S (XAtomicExec), and (2) time to compute an attestation token (XProve). The first runtime only depends S behavior itself (e.g., SW-Att can be a small sequence of instructions or have long loops). As aforementioned, VAPE does not affect S's runtime. XProve's run-time is linear on the total size of ER + OR. In a worst case setting where these regions occupy the entire program memory, 8 kB, XProve takes around 900ms to complete on an 8MHz device.
C. Comparison with CFA
To the best of our knowledge, VAPE is the first architecture for proofs of execution. Therefore, there are no other architectures that are directly comparable. Nonetheless, to provide a (performance and overhead) point of reference and a comparison, we contrast VAPE's overhead with that of three Control Flow Attestation (CFA) architectures. As discussed in Section II, even though CFA is not directly applicable to produce proofs of execution with authenticated outputs, we consider it to be the most closely related service, since it reports on the execution path of a program.
In this comparison, we consider three recent CFA architectures: Atrium [34] , LiteHAX [35] , and LO-FAT [33] . Figure 11 .a compares VAPE to these architectures in terms of number of additional hardware Look-Up Tables (LUTs) required. In this figure, the black dashed line represents the total cost of the MSP430 MCU: 1904 LUTs. Figure 11 .b presents a similar comparison for the amount of additional registers required by these architectures. In this case, the total cost of the MSP430 MCU itself is of 691 registers. Finally, Figure 11 .c presents the amount of dedicated RAM required by these architectures (VAPE's dedicated RAM corresponds to the exclusive access stack implemented by VRASED).
As expected, VAPE requires much lower overhead. According to these results, the cheapest CFA architecture, LiteHAX, would represent an overhead of nearly 100% LUTs and 300% registers, if applied to MSP430. In addition, LiteHAX would require 150 kB of dedicated RAM. This amount is far above the entire addressable memory (64 kB) of 16-bit processors, such as MSP430. These results support our claim that CFA is not applicable to this class of low-end devices. VAPE, on the other hand, introduces a total of 12% additional LUTs and 2% additional registers. VRASED requires about 2 kB of reserved RAM, which is not increased by VAPE's support to proofs of execution.
APPENDIX C: EXECUTABLE LIMITATIONS
We now discuss the limitations of our approach on the executable types. Shared libraries. In order to produce a valid proof, Vrf must ensure that execution of S does not depend on external code located outside its execution range ER (e.g., shared libraries). A call to such code would violate LTL 4, resulting in EXEC = 0 during the HMAC computation. One possible way to support this type of executable is to transform it into a self-contained executable by statically linking all dependencies during the compilation time. Another is to appropriately set ER to cover all external code used by S. Self-modifying code (SMC). SMC is a type of executable that alters itself while executing. Clearly, this executable type violates LTL 3 that requires the code in ER to remain unchanged during ER execution. It is unclear how VAPE can be adapted to support SMC; however, we are unaware of any legitimate and realistic use-case of SMC in our target baremetal applications. Interrupts. Our notion of successful execution in Section V-A prohibits an interrupt to happen during S's execution. This limitation can be problematic especially for interrupt-driven programs such as the ones in real-time systems. Nonetheless, simply allowing interrupts to happen during the execution may result in attacks that allow malware to modify intermediate execution results in data memory and consequently influence the execution output. One possible way to remedy this issue is to allow interrupts as long as all interrupt handlers are:
(1) immutable from the start of execution till the end of attestation and (2) included in the attested memory range during the attestation process. Vrf then can determine whether an interrupt that may have happened during the execution is malicious by inspecting all interrupt handlers from the proof of execution.
APPENDIX D: SOFTWARE TRANSFORMATION Recall that our notion of successful execution (in Section V-A) requires the function's entry point to be at the first instruction and the exit point to be at the last instruction. In this section, we discuss an efficient way to transform arbitrary software (besides the ones in Appendix VIII-C) implementing a function to conform with this requirement.
Line 10-17 of Figure 12 shows an (partial) implementation of the ReadSensor function described in Section VII. This implementation, when converted to an executable, does not guarantee VAPE's executable requirement since the compiler may choose to place one of its sub-functions, instead of ReadSensor, to the entry and/or exit points of the executable. One obvious way to fix this issue is to implement all of its subfunctions as inline functions; however, such approach may be inefficient as in this example it will create multiple duplicate code for the same sub-functions (e.g., digitalWrite) inside the executable.
Instead, we created the dedicated functions for the entry (Line 1-4) and exit (Line 6-8) points, and assign those functions to separated executable sections -".exec.entry" for the entry and ".exec.exit" for the exit. Then, we labeled all subfunctions used by ReadSensor as well as ReadSensor itself to the same section -".exec.body" -and modified the MSP430 linker to place ".exec.body" between ".exec.entry" and ".exec.exit" sections. The modified linker script is shown in Figure 12b . This way, we ensure that the entry and exit function locate at the beginning and the end of the executable, respectively, and thus the resulting executable conforms with VAPE's requirement. The overhead of this approach is small, it adding constant 10 byte to the instrumented executable.
