Trusted execution environments (TEEs) are becoming a requirement across a wide range of platforms, from embedded sensors to cloud servers, which encompass a wide range of cost and power constraints as well as security threat models. Unfortunately, each of the current vendor-speci c TEEs makes a xed choice in each of the design dimensions of deployability, trusted computing base (TCB), and threat model, with li le room for customization and experimentation. To provide more exibility, we present Keystone-the rst open-source framework for building customized TEEs. Keystone uses a simple abstraction of memory isolation together with a programmable layer that sits underneath untrusted components, such as the OS. We demonstrate that this is su cient to build reusable TEE core primitives, separate from platform-speci c modi cations and required application features.
1 Introduction e last decade has seen the proliferation of trusted execution environments (TEEs) to protect sensitive code and data. All major CPU vendors have rolled out their own forms of TEE to create a secure execution environment, commonly referred to as an enclave, including ARM TrustZone, Intel SGX, and AMD SEV [14, 17, 72] . On the consumer end, TEEs are now being used for secure cloud services [18, 24] , databases [84] , big data computations [27, 45, 89] , secure banking [67] , blockchain consensus protocols [25] , smart contracts [37, 68, 100] , machine learning [78, 96] , network middleboxes [52] , and so on. ese use-cases have diverse deployment environments ranging from cloud servers, client devices, mobile platforms, network ISPs, IoT devices, and hardware tokens.
Unfortunately, each vendor TEE represents a xed choice in the design space of deployability, trusted computing base Keystone is available at h ps://keystone-enclave.org/ (TCB), and threat model, leaving li le room for customization. When platform provisioners and enclave programmers choose a TEE, they are locked into a limited set of possibilities even when their use-case demands exibility. It is therefore not surprising that the current constraints within each vendor's framework pose challenges for customizations. For example, Intel SGX has a hard limit of 128 MB on the maximum size of secure memory [40] . is limitation has resulted in drastic slow-downs in large scale applications [18, 24, 34, 84, 90] . As much as the cloud providers want to use a secure enclave like SGX, they cannot address the limitation themselves and instead have to wait for Intel to increase the memory limit [71] . Similarly, there are several other challenges that remain unaddressed (e.g., dynamic resizing, secure I/O, side-channel a acks). Since it is in practice impossible to make changes to a proprietary ISA, the only path to solving these challenges is to re-implement TEE functionality from scratch in more open ISAs (e.g., OpenSPARC [33, 66] , RISC-V [41, 87] ), but each redesign requires considerable e ort and only serves to create another xed design point. ese observations motivate our work to develop customizable TEEs to provide a be er interface between hardware manufacturers, platform providers, and enclave developers. Customizable TEEs promise quick prototyping, shorter turnaround time for xes, adaptation to new threat models, and use-case speci c deployment. We can draw an analogy with the move from hardware-based networking solutions to Soware De ned Networking (SDN). We believe a similar paradigm shi in TEEs will pave the way for e ortless TEE customization, allowing security to be tuned for each platform and use case. Speci cally, hardware manufacturers can architect platforms that support the same baseline TEE standards, while platform providers can purchase and customize these TEEs, as in SDNs, to cater to diverse deployment scenarios and supported applications. Lastly, the enclave developers can choose from several such TEEs and lower their TCB via customization.
We observe that the rst challenge in realizing this vision is the lack of a programmable trusted layer below the untrusted OS. Existing rmware models do not generally satisfy this requirement. Programming at the micro-code layer is near-impossible because the existing hardware is in exible, hard to customize, and closed-source. us, most current solutions emulate programmability via a bloated trusted hypervisor layer. Second, previous a empts at re-using ISA primitives to give TEE guarantees depend on the underlying hardware for memory isolation. ese approaches are limited by the lack of exibility in changing the hardwarede ned isolation boundaries. For example, Intel SGX enforces isolation by a xed-sized address range for encrypted RAM, and ARM TrustZone supports a custom-sized secure world but does not allow sharing of address spaces. Our insight is that given an inbuilt hardware memory isolation primitive and a so ware-programmable layer with the appropriate privileges, we can build a customizable TEE that is simple and exible.
To this end, we propose Keystone-the rst open-source framework for building customizable TEEs. Our design builds essential TEE primitives from minimal hardware abstractions such as memory isolation. In doing so, we make several conscious design choices to ensure modularity. We leverage the programmable layer exposed by the underlying hardware and demonstrate how one can build a framework which can provision on-demand, rich, and modular customizable TEEs. Keystone allows hardware manufacturers, platform providers, and enclave developers to con gure various design choices such as TCB, threat models, workloads, and TEE functionality via our novel plugin interface. We implement Keystone on RISC-V, an open-source hardware ISA. RISC-V supports a machine mode (M-mode) which is equivalent to the micro-code layer of CISC architectures. Mmode is programmable and is thus a perfect place for adding TEE support. Further, RISC-V has recently added standard speci cations [16] for physical memory protection (PMP)-a memory isolation primitive which allows M-mode to specify arbitrary protections on physical memory regions. us, RISC-V ful lls the desired requirements and is a natural t.
e Keystone framework provides standard TEE guarantees such as secure boot and memory isolation by default. We build additional plugins to enable a estation, secure source of randomness, secure timers, system call interface, and standard libc support inside the enclave. We demonstrate the power of our programmable design by adding support for enclave-managed free memory and dynamic on-demand scaling of enclave size. By the virtue of our modular design, which allows us to transparently leverage additional hardware features, we build a cache side-channel defense for platforms which support cache partitioning. Lastly, we build two representative TEE enclave runtimes: a custom modular minimal-TCB runtime (Eyrie) and an o -the-shelf microkernel (seL4 [61] ). We extensively benchmark Keystone on a generic RISC-V suite (RV8), an IO-intensive workload (IOZone), and two CPU-intensive workloads (Beebs and CoreMark). We showcase use-case studies where Keystone can be used for secure machine learning (Torch and FANN frameworks) and cryptographic tasks (sodium library) on embedded devices and cloud servers. Lastly, we demonstrate Keystone on di erent RISC-V platforms, three in-order cores, one out-of-order core, a QEMU emulation, and an FPGA implementation. Keystone is open-source and available online [3].
Contributions. We make the following contributions:
• 
Problem & Approach
We motivate the need for customizable TEEs which can adapt to various design requirements prevalent in real deployment scenarios. We then highlight the main insights that make customizable TEEs feasible and the necessary abstractions.
Motivation. Although TEE systems are widely used, the threat models, TCB size, expressiveness, target devices, and application workloads that these solutions target are points in the design space covering only a subset of useful congurations. For instance, consider a scenario where one wants to deploy a secure IoT edge device which collects sensor data at a hub and relays data analytics to a remote server.
is requires a TEE that can execute on memory and energy-constrained embedded devices while supporting secure network communications in the enclave, and ideally with low TCB. is does not overlap with any of the existing design points in the current TEE solution space. us, end users (e.g., platform provider, enclave developer) are le with the choice of either compromising on one of the design requirements (e.g., resort to a large TCB solution) or take the onus of building their own custom design.
Customizable TEEs. We de ne a new paradigm-customizable TEEs-wherein both the platform provider and the enclave programmer can customize what primitives and guarantees a TEE should employ. An analogous concept to customizable TEEs is SDN, where the ability to easily program the control layer opens up far more exibility in deployment. In our model, the manufacturer of the platform is not the sole arbiter of the TEE design and instead only provides a set of primitives for the platform provider to programatically interact with. e task of realizing a speci c TEE instance is then a combination of the platform provider's implementation of the hardware interfacing and trust model, and the enclave programmer's feature requirements. eir choices are o oaded to a framework which plugs in required components and composes the expected TEE.
Customizable TEEs bridge existing gaps in the TEE design space with a framework wherein the users and researchers can independently explore design trade-o s without signicant development e ort. Additionally, they allow for rapid response to vulnerabilities and new feature requirements.
Abstractions for Customizable TEEs
e goal of customizable TEEs is to avoid enumeration of all possible points in the design space while allowing instantiation of any given point. We outline the important abstractions necessary for customizable TEEs. A customizable TEE has three logical actors: e platform builder develops and fabricates the hardware. e platform provider operates the hardware and makes it available to a user. e user/enclave programmer develops so ware and runs it on hardware provided by the platform provider.
Hardware-enforced Memory Isolation. e hardware must provide exible memory isolation as a native primitive, where isolation must be customizable at runtime but only by a suitably privileged mode.
Programmable Mode Below & Above Untrusted Code.
A trusted entity executing on the hardware should be able to manage and provision isolated memory regions. Specifically, the hardware should allow programming of trusted so ware which has higher privileges than the untrusted components (e.g, OS, hypervisor). is allows the customizable TEE framework to enforce custom isolation policies in soware while o oading policy enforcement to the hardware.
is mode must also have minimal non-security responsibilities to minimize TCB and present clean abstractions. Most commodity platforms implement this highest privilege mode as rmware or microcode, which do not meet our programmability requirements.
Modularity. Each layer should be independent and provide an abstraction to the layers above it. For example, the trusted so ware de nes the isolation boundaries, while leaving the complex resource management within each isolation to the untrusted components. Further, each layer should enforce a set of guarantees, which can be easily checked by the lower layers. For example, the so ware manages address spaces while the hardware merely enforces con gured isolation. Lastly, the framework should maintain compatibility with existing notions of privilege levels. For example, applications that are programmed to execute in the user-level code should not be expected to re-implement kernel-level functionality in order to execute in a TEE environment.
To this end, we envision a design of customizable TEEs which allows for the maximum degree of freedom with minimum e orts. More importantly, to enable this new paradigm, we need hardware which provides the above abstractions.
Customizable TEEs with RISC-V. RISC-V is an open ISA with multiple open-source core implementations [19, 32] . RISC-V currently supports up to three privilege modes: Umode (user) for normal user-space processes, S-mode (supervisor) for the kernel , and M-mode (machine), which can directly manage physical resources such as I/O, memory, or devices. Keystone utilizes three aspects of M-mode. First, M-mode is programmable by platform operators and meets our needs for a minimal highest privilege mode. Second, M-mode controls hardware delegation of the interrupts and exceptions in the system. All lower privilege modes can only receive exceptions or use CPU cycles only when the M-mode allows it.
ird, physical memory protection (PMP), a recently introduced feature in the RISC-V Privilege ISA, allows M-mode to apply simple access policies to physical memory.
is enables M-mode to isolate regions of memory at runtime or disable access to memory-mapped control features. Only M-mode may con gure the boundaries and access to PMP regions. us, RISC-V provides the abstractions required to build customizable TEEs.
Design Dimensions for Customizable TEEs
We next outline design dimensions of customizable TEEs.
Platform Support & Workloads. ere is a stark split in the TEE platforms and the workloads they support. Trust-Zone is deemed be er suited for embedded devices and mobile platforms, partly because most of these devices have ARM processors. Intel SGX has become a popular TEE for server applications [17] . AMD SEV is targeted at securing large workloads and has seen adoption in virtual machines [14] . As opposed to these workload-targeted platform designs, Keystone can support various workloads and frees platform providers and enclave developers from inadvertent lock-in to a speci c TEE design based on their workloads.
TCB Size & Expressiveness. Even within a TEE, there is a diverse set of functionality one may want to execute. For example, server applications can vary from simple in-memory databases which do not use fork [18] to complex databases with load-balancing via fork-exec [34, 84] . Given the various enclave programming frameworks available, a cloud service provider may want to choose the one with maximum expressiveness to support a large fraction of user applications. e downside of this approach is that the enclave TCB increases with complex functional support, irrespective of whether the application uses the functionality. us, the same database can be executed with a TCB of few thousand LOC [18] or a million LOC [24, 34] , depending on the design choice of the underlying programming infrastructure. With Keystone's modular design, the platform provider can instantiate a TEE with minimal TCB required to execute the enclave. e enclave programmer can further optimize the TCB by on-the-y customization of what is included in the enclave based on the needs of the enclave logic.
Threat Models. Enclaves are primarily designed to prevent an a acker from compromising the con dentiality and integrity of the enclave logic. However, the a ack surface depends on the environment which hosts the enclaves. A physical a acker can intercept, modify, or replay signals that leave the chip package. Keystone assumes that the physical a acker is not capable of intercepting or modifying on-chip signals. A so ware a acker can control host applications, the untrusted OS, network communications, launch adversarial enclaves, arbitrarily modify any memory not protected by the TEE, or it may add/drop/replay enclave messages. A side-channel a acker can glean information by passively observing interactions between the trusted and the untrusted components. Not all platforms will be potentially vulnerable to all side-channel a acks because they may not contain shared/multiple caches and out-of-order execution. A denialof-service a acker can a empt to take down the enclave or the host OS at any time. Keystone allows denial-of-service a acks against enclaves because traditionally the OS can refuse services to user application at any time. On the other hand, a DoS a ack from the enclave is undesirable.
Keystone allows exible usage of the underlying defense mechanisms to adapt against a subset of the threat model. One might argue that the TEE should always defend against as many possible threat models at all times. However, if both the platform provider and the enclave developer are sure that a certain class of a acks are out of scope, turning o a defense mechanism can return signi cant performance or power improvements. For example, if the user is deploying TEEs in its own private data centers or home appliances, a physical a acker may not be a threat. Current individual TEE proposals [17, 26, 41, 72] have distinct threat models, but are restricted to a speci c choice.
Any Keystone TEE instance will guarantee con dentiality and integrity of enclave code and data from a so ware a acker.
In summary, the Keystone allows platform operators and developers to customize design aspects of TEEs. 
Keystone Design
We discuss the Keystone design components, outline our design principles, and how we enable customizable TEEs.
Overview
e Keystone usage model is that an untrusted OS executes an untrusted host application, which in turn launches one or more trusted enclaves (similar to SGX usage model). In addition, Keystone introduces two new components, namely security monitor and runtime. Figure 1 shows the detail of the interactions between these components.
Building TEE Primitives from Hardware Abstractions. Keystone uses the isolation and programmable mode provided by RISC-V to build critical TEE guarantees. Speci cally, we design a security monitor (SM) which enforces the TEE guarantees across the entire platform. e SM executes in Mmode and is programmed in C/assembly. For customization, the platform operator speci es the SM con guration and trusts that the SM will enforce all relevant security boundaries between enclaves and the host/OS. Such a design allows Keystone to transparently extend the guarantees given by the hardware. For example, if the device has additional features (e.g., cache partitioning, crypto engine, source of randomness), the SM can enforce additional protection or improve the performance without changing the rest of the layers (e.g., OS, user applications). More importantly, it ensures that untrusted layers (e.g., OS) executing above the SM cannot circumvent enforcement.
On-demand, Rich, and Modular Programming Model.
A Keystone enclave comprises a runtime (RT) which executes in S-mode and an enclave application (eapp) executing in U-mode. Both RT and eapp are part of the enclave address space which is isolated from the untrusted OS and other user applications. Most importantly, the RT is speci ed by the user and can be con gured according to the needs of the target eapp it needs to support. e RT manages the lifecycle of the user code executing in the enclave, manages memory, services syscalls, etc. Speci cally, the RT uses a limited set of API functions exposed by M-mode via the RISC-V supervisor binary interface (SBI) to exit or pause the enclave (See Table 1 request the SM to perform operations on behalf of the eapp (e.g., a estation, get random values) via the SBI. Note that each enclave instance may choose its own RT con guration which is not shared with any other enclave. is separation allows for signi cant modularity in application support for Keystone while decreasing the per-eapp TCB.
Maintaining Developer-friendly Interface. Keystone can execute entire application logic or developer-speci ed parts as an eapp. Keystone does not allow the eapp to directly access any untrusted memory or use untrusted OS APIs (e.g., system calls). is restriction may break the common program semantics or, even worse, force the eapp programmer to re-design their application speci cally for Keystone. Our design goal is to avoid any such additional e ort. us, we follow the design principle of maintaining strict compatibility with existing user-code programming notions. We bridge the gap between Keystone restrictions and traditional program via the RT. Speci cally, an eapp may continue to assume a fully functional OS, and the RT can provide the necessary safe functionality. is design allows for the use of unmodi ed eapp programs in Keystone with a su ciently feature-rich RT.
Keystone Memory Isolation Primitives
We describe how the simple design of PMP is su cient to implement exible memory isolation of Keystone enclaves.
Background: RISC-V Physical Memory Protection. Keystone uses physical memory protection (PMP) feature provided by RISC-V. PMP allows restrictions on the physical memory access of S-mode and U-mode to certain regions de ned via PMP entries. 2 Figure 2 shows the PMP details. Each PMP controls the U-mode and S-mode permissions to a customizable region in the physical memory. 3 Speci cally, 2 pmpaddr and pmpcfg CSRs are used to specify PMP entries 3 Currently, a RISC-V processor may have up to 16 PMP entries, which can be con gured by M-mode. Figure 2 . How Keystone uses RISC-V PMP for the exible, dynamic memory isolation. e SM uses a few PMP entries to guard its own memory (SM) and enclave memories (E1, E2). Upon enclave entry, the SM will recon gure the PMP such that the enclave can only access its own memory (E1) and the untrusted bu er (U1).
the PMP address registers encode the address of a contiguous physical region. e PMP con guration bits specify the r-w-x permissions and two addressing mode bits. Each permission bit set determines if U-mode or S-mode can operate on the memory region. PMP has three addressing modes such that the base address and the size of the memory region can be both a power of two or an arbitrary number. PMP entries are statically prioritized where a lower numbered PMP entry has higher matching priority-a lower numbered entry can over-rule the permission bits of a higher entry.
Enforcing Memory Isolation via SM. PMP makes Keystone memory isolation enforcement exible for three reasons. First, one can con gure enclave regions proportional to the number of PMP registers in the CPU. us, multiple discontiguous enclave memory regions can coexist, instead of reserving one large memory region shared by all enclaves. Second, PMP supports exible addressing modes. A memory region can cover a region as small as 4 bytes, or as large as the entire DRAM. is enables Keystone enclaves to utilize page-aligned memory with an arbitrary size. Lastly, PMP entries can be dynamically recon gured during execution. Instead of occupying a set of memory regions permanently, Keystone can dynamically create a new region, or release a region to the operating system as needed. Figure 2 shows how Keystone utilizes them for memory isolation. Speci cally, when the SM boots, it con gures the rst PMP entry (highest priority) to cover its own memory region containing the code, stack, and the secure data such as enclave metadata and keys. Further, the SM disallows access to its memory from U-mode and S-mode. e SM then con gures the last PMP entry (lowest priority) to cover all memory and with all permissions enabled, such that the OS can access the non-SM memory.
When a host application requests to create an enclave, the OS is in charge of allocating a contiguous physical region. Upon successful creation of an enclave, the SM protects the enclave memory by adding a PMP entry with all permissions disabled. Since the enclave's PMP entry has a higher priority than the OS PMP entry (the last in Figure 2 ), the OS or any other user processes cannot access the enclave region. Further, the enclave regions are not allowed to overlap with each other or with the SM region.
When the CPU transfers the control to enclave execution, the SM ips the PMP permission bits of the relevant enclave memory region. At the same time, the SM also removes all permissions from the OS PMP entry to protect the OS memory from the enclave. Both these steps together allow the enclave to access its own memory safely. When the CPU performs a context switch from enclave to non-enclave, SM ips the permission bits and isolates the enclave memory from the OS. Enclave PMP entries are freed on enclave destruction.
PMP Enforcement Across Cores. Each core has its own complete set of PMP entries. During enclave creation, Keystone adds a PMP entry to disallow everyone from accessing the enclave. ese changes during creation must be propagated to all the cores via inter-processor interrupts (IPIs).
is ensures that the other cores are disallowed from accessing the enclave. During enclave execution, changes to the PMP entries (e.g., context switches between enclave and host) are local to the core executing it and need not be propagated to other cores. When Keystone destroys the enclave, all the other cores are noti ed to update their PMP entries.
ere are no other times that PMPs must be synchronized via IPIs.
Other Keystone TEE Primitives
We outline the remaining standard TEE primitives supported in Keystone. Keystone provides well-de ned interfaces for each of the following components. Our implementation directly supports secure boot, trusted timer, and secure source of randomness. For the remaining primitives, Keystone provides stubs which can be used to integrate existing solutions.
Secure Boot. A Keystone root-of-trust can be either a tamperproof so ware (e.g., a zeroth-order bootloader) or hardware (e.g., crypto engine). At each CPU reset the root-of-trust measures the SM image, generates a fresh a estation key from a cryptographically secure source of randomness, and stores it to the SM private memory. e root-of-trust then signs the measurement and the public key with a hardware-visible secret. ese are standard operations, can be implemented in numerous ways [59, 64] , and Keystone does not rely on a speci c implementation. For completeness, currently, Keystone simulates secure boot via a modi ed rst-stage bootloader that performs all the above steps.
Secure Source of Randomness. e Keystone framework provides access to a trusted source of randomness via the SM SBI (random) which returns 64 bits of random value. Keystone should use a hardware source of randomness to ful ll requests if available or can fallback to well-known options (e.g., CPU ji er) or exotic methods (e.g, DRAM decay [73] ).
Trusted Timer. Keystone allows the enclaves to access the timer registers maintained by the RISC-V hardware via the rdcycle instruction. More importantly, these timers are not writable, and may not be modi ed by the untrusted OS. e SM also supports a set of standard timer SBI calls.
Remote A estation. Keystone SM performs measurement and a estation based on the provisioned key. When an enclave asks for an a estation report, the SM signs the enclave measurement with the a estation key and copies the signature to the enclave memory. Keystone uses a standard scheme to bind the a estation with a secure channel construction [48, 64] .
e SM allows the enclave to include limited arbitrary data (e.g., Di e-Hellman key parameters) to be included in the signed a estation report. Key distribution [15] , revocation [54] , a estation services [56] , and anonymous a estation [28] are orthogonal and can be implemented on top of the Keystone.
Secure Delegation. Keystone delegates traps (i.e., interrupts and exceptions) raised during enclave execution to the RT via RISC-V hardware interrupt delegation registers. While doing so, Keystone ensures that it does not directly leak any enclave state and does not allow the OS to start executing at arbitrary entry points in the enclave. For user-de ned exceptions, the RT invokes the appropriate handler inside the enclave. For others, the RT forwards them to the untrusted OS via the SM. e SM performs the context switch by (a) saving the enclave context including general purpose and control-andstatus registers (CSRs); (b) returning the execution control to the untrusted OS to handle the trap. e OS can the return the execution control to the enclave via the SM SBI, where the SM examines the OS return values, sanitizes them to ensure they are safe to deliver to the enclave, and resumes the enclave execution.
Monotonic Counters & Sealed Storage. Enclaves may need monotonic counters for protection against rollback a acks and versioning [40] . Keystone framework can support monotonic counters by keeping the counter state in the SM memory. is is safe because the OS cannot overwrite or remove the data from the M-mode where the SM executes. In the future, Keystone interface can interact with non-volatile memory for persistent counters [91] as well as TPM and NVRAM for be er latencies and durability [81, 92] . Going forward, Keystone can support sealed storage such that the eapp can use untrusted storage memory or device store encrypted content which is tied to the enclave identity.
Enclave Lifecycle
We summarize the end-to-end life cycle of a Keystone enclave in Figure 3 , including the key steps in the enclave lifetime and the corresponding PMP changes. 
Keystone Framework
We describe the usage, programming, and deployment model of our novel Keystone framework. In doing so, we highlight the programmability aspect of our framework and how our modular design allows us to transparently extend Keystone functionality with plugins. Figure 4 shows the steps involved in an end-to-end eapp deployment. e platform builder provides a RISC-V core with PMP support, secure key storage, and may add other hardware functionality (e.g., source of secure randomness, caches, memory encryption engines). e platform provider deploys RISC-V cores via this o -the-shelf RISC-V hardware. To instantiate customizable TEEs with Keystone, the provider con gures the SM for core TEE primitives (e.g., memory isolation). ey may also con gure SM plugins which expose additional functionality of the platform.
e Keystone framework provides RTs which interface with the platform provider's SM to provide user code abstractions to an eapp. Additionally, RT plugins may be used to Figure 4 . Keystone End-to-end Overview. e platform provider con gures the SM. Keystone compiles and generates the SM boot image. e platform provider deploys the SM. e developer writes an eapp and con gures the enclave. Keystone builds the binaries and computes measurements. e untrusted host binary is deployed to the machine. e host deploys the RT and the eapp and initiates the enclave creation. A remote veri er can perform a estation based on the known platform speci cations, keys, and SM/enclave measurements.
provide syscall interfaces and memory management features. e enclave programmer writes eapps, and can con gure what platforms they expect to execute on, what SM and RT it accepts, and the set of plugins the eapp requires. e developer speci es all such requirements in an eapp con guration le and carries out the rest of the development procedure as any other non-enclave application. e developer can choose to execute existing non-enclave applications inside an enclave without any modi cation. Otherwise, they can use the Keystone SDK which provides common enclave utilities to write native Keystone eapps.
Keystone acts as an intermediary between the above entities. Speci cally, our framework takes in the eapp sourcecode or binaries, along with a con guration specifying the RT, architecture features, and the plugins that the enclave will require at the RT level.
e Keystone toolchain can then be used to compile a RT for the eapp, as well as a corresponding host application. Developers can deploy their eapp by launching a Keystone enclave on a platform of their choice. e enclave can send a signed enclave measurement (an a estation report) to remote parties and prove that the platform provider is indeed executing on authentic hardware with a known SM, and the expected RT and eapp.
Plugin Support. Keystone provides all the basic building blocks for customizable TEEs described in Section 3.2. We point out that a programmable M-mode along with perenclave RTs allows Keystone to enable a variety of memory management, functionality, and security features in the SM as well as the RT. We now demonstrate various Keystone plugins we have built on top of Keystone building blocks.
Enclave Memory Management Plugins
Keystone enclaves may occupy anywhere from several tens of KBs up to several GBs as long as the OS can reserve a contiguous physical memory region for the enclave at the time of creation. 4 is exibility in enclave physical memory size enables a wider range of applications and use cases for Keystone. For example, one can run batch workloads such as deep learning inferencing with a large enclave size in order to avoid paging overheads. In the default Keystone design a er the OS gives contiguous physical memory to an enclave, the enclave manages this memory using the RT S-mode privileges. us, Keystone does not rely on the OS for virtual memory management other than the limited initial mappings. Note that, the enclave has to specify the maximum physical memory size it needs, as well as the size of the stack, heap, code, and data section so that the OS can create initial virtual-to-physical mappings for the enclave.
is requirement, although suitable for small eapps, may limit some larger eapps. To this end, we describe several memory management plugins which add additional virtual memory support to the RT. e eapp developer, if required, can optionally enable these plugins in the RT to alleviate the default-mode restrictions. Relevant SM plugins must be enabled by the platform provider.
Free Memory. We introduce a plugin which allows the enclave to reserve physical memory without mapping it to any virtual address by specifying a new region called free memory. Free memory is not associated with any virtual address during initialization and is not included in the enclave measurement. Instead, the RT simply ensures that free memory is zeroed before beginning the eapp execution. When the eapp asks for more stack or heap memory during its execution, the RT uses pages from the free memory region to dynamically allocate stack or heap pages, create anonymous mmaps, and execute binaries which do not have a static virtual layout. is free memory plugin allows the eapp to make be er use of the xed-size enclave memory.
Dynamic
Resizing. e size of the physical memory that an eapp uses at runtime depends on the workloads. For example, certain applications (e.g., machine learning inference) may have a xed memory footprint, whereas other applications (e.g., databases) have arbitrary memory usage. Asking the eapp developer to specify the maximum enclave size and then statically pre-allocating the physical memory for it can prevent scaling and compatibility. To this end, Keystone allows the RT to request dynamic changes to the physical memory boundaries of the enclave. Speci cally, the RT may request that the OS make an extend SBI call to add physical pages to the enclave memory region as able. If the OS succeeds in such an allocation, the SM increases the size of the enclave and noti es the RT. e RT can then use these pages as a part of the free memory and allocate them to satisfy the eapp requests (e.g., brk). Note that by combining with the free memory plugin the eapp can start with a small static stack and heap size in the initial con guration and then scale resources based on workloads instead of pre-determining all the memory needs. Dynamic resizing may be a side-channel for some applications and is entirely optional.
Functionality Plugins
Edge Call Interface. e eapp cannot access non-enclave memory in Keystone, so if it needs to read or write data outside the enclave (e.g., sending a estation reports), the RT can perform edge calls on their behalf. Our edge call plugin is functionally similar to RPC. Speci cally, an edge call consists of an index to a function implemented in the untrusted host application and the parameters to be passed to the function.
e RT tunnels such a call safely to the untrusted host, copies the return values of the function back to the enclave, and sends them to the eapp. e copying mechanism requires the RT to have access to a bu er shared with the OS by: (a) before the SM creates the enclave, the OS allocates a shared bu er in its memory space and passes the address to the SM; (b) the SM then binds the address to the enclave so that the enclave RT can access the memory; (c) the SM uses a PMP entry to control the shared bu er permissions. We enforce all edge calls passing through the RT by not making the shared memory virtual mappings available to the eapp.
System Calls. System calls operate as a subset of edge calls and Keystone piggybacks on the mechanism described above to tunnel calls between the OS and the eapp. Speci cally, the user application invokes the system call on behalf of the eapp, collects the return values, and forwards them to the eapp. e plugin forwards only speci c de ned system calls. For other syscall operations (mmap, brk, random generation, etc) Keystone either invokes a SM interface or performs the operation in the RT and returns the values to the eapp.
Multi-threading. Keystone supports multi-threaded eapps by scheduling all the threads on the same core. We do not support hyper-threading or parallel multi-core execution of the enclave yet. e RT performs the thread context switch, and all the thread local storage is protected by the enclave.
SM Plugins
We demonstrate a platform-speci c side-channel defense which can be completely implemented in the SM. If the platform builder can produce hardware with the required features, the platform provider can con gure its usage in deployment. Such plugins are implemented via our platformspeci c build options for the SM.
Cache Partitioning. If the platform has a shared cache, it renders the enclaves potentially vulnerable to cache sidechannel a acks from the untrusted OS and other applications in other cores. Fortunately, the FU540-C000, one such RISC-V SoC with a shared L2 cache, provides a highly customizable L2 cache controller with a waymasking primitive similar to Intel's CAT [75] . Our plugin uses the controller to partition the cache such that the enclave and the adversary do not share any cache ways. Speci cally, the SM plugin combines waymask management and PMP to way-partition the L2 cache transparently to the OS and the enclaves.
Implementation
We implement our SM on top of the Berkeley Boot Loader (bbl) [8] . We provide a tool which generates the expected measurements of the eapp and RT. We build a simple RT called Eyrie to perform the minimal task of loading and executing enclaves and provide various plugins. We port the seL4 microkernel [61] to Keystone by making 290 LoC changes to the original seL4 code for booting, initializing memory, and interrupt handling in Keystone. Note that Keystone does not limit the developer choice to these two RTs, they can write a custom RT if necessary.
We ensure that Keystone can execute on various RISC-V platforms and simulation environments with PMP support. First, Keystone executes on real CPUs (e.g., FU540 [1] ) and can be deployed on various IoT devices [2] . Second, Keystone can execute on FPGA implementations of popular RISC-V cores [19, 32] both locally and on commodity cloud platforms which support FPGA instances. is allows developers to perform data-center scale benchmarking and testing. ird, we integrate Keystone changes to RISC-V QEMU [7] , which allows developers to debug the RT and the eapp before deployment. Finally, we add support for executing Keystone on a RISC-V cycle-accurate simulation platform [57] to introspect the execution of the eapp, RT, and SM. Keystone is available on github at h ps://keystone-enclave.org/.
Evaluation
We aim to answer the following questions in our evaluation: Rocket (Rocket) [19] , and Berkeley Out-of-order Machine (BOOM) [32] ( Table 2 ). All cores are con gured to execute at one GHz frequency. For the open-source processors, we instantiate each platform with an FPGA-synthesized core using FireSim [57] . e host OS is buildroot-based Linux (kernel 4.15). All the evaluation was performed on SiFive HiFive Freedom Unleashed and the data is averaged over 10 runs unless otherwise speci ed. Additional data is available [3].
Modularity & Platform Support
Our con guration-based interface allows the eapp developer to turn on / o various plugins. We outline the qualitative measurement of this exibility for extending features, reducing TCB, and platform deployability.
Extending RTs. First, our exemplary modular RT, Eyrie, has a number of plugins for running eapps. Extending the SM. e advantage of an easily modi ed SM layer is noticeable when features require interaction with core TEE primitives like memory isolation. Our rst example is an SM plugin for secure cache-partitioning on Table 3 shows the TCB breakdown of each component if the eapp enables the plugin. Although Keystone requires the driver, host application, and host OS, these components are not trusted and do not contribute to the total TCB. Further, the eapp developer can tune the enclave TCB. us, Keystone TCB amounts to 10, 000 LoC which is within the realm of formal veri cation in the future.
Benchmarks
We conduct a detailed performance analysis of Keystone across several con gurations. In particular, we rst show a breakdown of the commonly required operations on each of the four platforms in Table 2 . We instrument the Eyrie RT and SM for collecting timing data on the FU540 and use FireSim measurements for the three FPGA platforms. is experiment is designed to identify bo lenecks in Keystone and explain the performance of macro benchmarks in the subsequent evaluation. Next, we show performance overhead with four standard benchmarks: Beebs (CPU-bound), Core-Mark (CPU-bound), RV8 (well-known RISC-V benchmark), and IOZone (I/O and FS) on the FU540. We then report the Figure 6 . IOZone le operation throughput in Keystone for various le and record sizes (e.g., r8 represents 8KB record). We only show write and read results due to limited space, other data at [3] performance impact of the cache partitioning plugin with RV8 as an example of Keystone trade-o s.
Common Operations. Figure 5 shows the breakdown of various enclave operations. e initial validation and measurement take 2M-7M cycles/page (Figure 5a ). It dominates the cost of all other creation operations because Keystone uses a so ware implementation of SHA-3 [12] , which can be replaced with an optimized version or (preferably) dedicated cryptographic hardware. Similarly, the a estation operations are dominated by the unoptimized ed25519 [6] signing so ware implementation (not shown in the graph, 0.7M-1.6M cycles). Both these steps are one-time operations per enclave execution and do not impact the execution time of eapp code a er initialization. Next, we highlight the latency of an SM context switch (Figure 5b ) from enclave to the host OS, which is the most frequent SM operation during the eapp execution. Keystone currently takes 1.8K-2.6K cycles depending on the platform. Lastly, FU540 (4-core) takes more cycles for enclave creation and destruction than other platforms (single-core), which can be a ributed to multi-core PMP synchronization.
Standard Benchmarks. We use several unmodi ed standard benchmarks as eapps to demonstrate the CPU overheads and the impact of I/O proxy in Keystone.
Beebs, CoreMark, and RV8. As expected Keystone incurs no meaningful overheads (±0.7%) for pure CPU benchmarks as they run unmodi ed as eapps.
IOZone. Next, we use IOZone [77] to measure the overheads for various le operations requested by an eapp. We tunnel all the le related system calls to the untrusted OS because the target le system is situated in the untrusted host. Figure 6 shows the throughput plots of common le-content access pa erns. Although the trends are roughly the same, Keystone experiences high throughput loss for both write (avg. 36.2%) and read (avg. 40.9%). ere are three factors which amount to the high overhead: (a) all data crossing the privilege boundary is copied via the untrusted bu er (doubling the number of bu er copies), (b) each call requires the RT to go through the edge call interface, incurring a constant overhead, and (c) untrusted bu er contends in cache with le bu ers, incurring more throughput loss on re-write (avg. 38.0%), re-read (avg. 41.3%), and record re-write (avg. 55.1%) operations. Since (b) is a xed cost per system call, it increases the overhead for the smaller record sizes.
Cache Partitioning. e mix of pure-CPU and large workingset benchmarks in RV8 are apt to demonstrate cache partitioning impact. We con gure a 50-50 cache partitioning, where the enclave gets 8 of the 16 ways in the FU540 L2 cache. Figure 7 shows the performance overheads for RV8 with respect to native execution when the cache partitioning turned o and on. We observe that small working-set tests show small performance overheads from cache ush on context switches, whereas large working-set tests (primes, miniz, and aes) show up to 128.19% overhead as expected. However, the enclave initialization latency is not a ected.
Case Studies
We present three case-studies where Keystone proves to be suitable for deploying a TEE and demonstrate how Keystone can be adapted for a varied set of devices, workloads, and application complexities. To this end, we chose several example applications: machine learning workloads for the client and server-side usage, a cryptographic library porting e ort for varied RTs, and a small secure computation application wri en natively for Keystone. All the evaluation for the case-studies was performed on SiFive HiFive Freedom Unleashed.
Porting E orts & Setup. We do not make any change to the application code logic, all applications have their test con gurations and arguments hard-coded for consistency.
In the process of porting these applications, we add partial support in Keystone for standard libc implementations. Speci cally, Keystone supports eapps statically linked with glibc and musl libc in the Eyrie RT. We port a widely used cryptographic library namely libsodium to both Eyrie and w id e r e s n e t r e s n e x t 2 9 in c e p t io n v 3 r e s n e t 5 0 d e n s e n e t v g g 1 9 r e s n e t 1 1 0 s q u e e z e n e t le n e t seL4 RT with zero developer e ort. We measure both the setup time and internal execution time of eapps.
Case-study 1: Secure ML Inference with Torch and Eyrie.
We execute the Torch library with Eyrie RT to perform inference using nine models with increasing sizes of parameters and layers on Imagenet dataset [44] . Speci cally, we execute a total of 15, 755 and 15, 400LoC of TH [10] and THNN [13] libraries from Torch framework compiled with musl libc. Each model implementations comprise an additional 230 to 13, 399 LoC of model-speci c inference code obtained from the ONNX model converter in Privado [96] . Table 4 shows the details of each model. We perform two sets of experiments: rst we execute the model inference code for each of the nine networks with static free memory allocation where we specify the maximum enclave size; second, we turn on the dynamic resizing plugin so that the enclave extends its size on-demand when it executes. Figure 8 shows the performance overheads for these two con gurations with respect to the native execution without Keystone. We measure the time for initializing the enclave (e.g., loading, hashing, page setup) and executing the eapp logic. Initialization Overhead. We observe that the initialization for Keystone is noticeably expensive for both static enclave size and dynamic resizing and is proportional to the eapp binary size. is is expected because the cost of hashing the enclave pages dominates the execution costs, but loading the binaries and se ing up virtual memory are comparatively cheaper operations (as shown in Section 6.2). Another observation is that the dynamic resizing reduces the initialization latency by 2.9% on average as the RT does not need to allocate free memory during enclave creation.
eapp Execution Overhead. We report an overhead between −3.12% to 7.35% for all the models with both static enclave size and dynamic resizing. Speci cally, LeNet is faster while Densenet is the most expensive in both cases. We explain this with three phenomenons. First, Keystone loads the entire binary in physical memory before it beings eapp execution.
us, the eapp does not incur any page faults, whereas the baseline page faults when it loads the binary pages during the execution. is explains why smaller sized networks, LeNet in this case, execute faster in Keystone compared to the baseline. Second, the overhead is primarily proportional to the number of layers in the network, because a large number of layers leads to more memory allocations (for input and output tensors of each layer). is results in an increase in mmap, brk syscalls. We see this slowdown for large size allocations because Eyrie RT's custom mmap implementation is not as fast as the baseline kernel. We veri ed that this is indeed the source of overheads by hand-coding a small test which makes large calls.
is explains why Densenet, which has the maximum number of layers (910), su ers from larger performance degradation. In summary, Keystone incurs acceptable overheads for long-running ML applications with a xed one-time startup cost and the dynamic resizing plugin is useful for larger eapps.
Case-study 2: Secure ML with FANN and seL4. FANN, a minimal ANN implementation comprising a total of 8, 462 C/C++ LoC, is suitable for embedded devices. We use seL4 RT, which is also widely used for embedded devices, to train and test a simple XOR network. We report an overhead of 0.36% for the end-to-end execution over baseline seL4 measured without Keystone. is shows that Keystone can be used for small devices such as IoT sensors and cameras to train models locally as well as ag event by executing model inference.
Case-study 3: Secure Remote Computation. We design and implement a simple secure server eapp (and remote client) that counts the number of words in a given message, and execute it with Eyrie and no plugins. e eapp rst performs a estation using libsodium and establish a secure channel with the remote client bound to the a estation report. e eapp then polls the host application for encrypted messages using the edge-call library, processes them inside the enclave, and returns an encrypted reply to be sent to the client. Our secure channel code using libsodium is 60 LoC, the edge-wrapping interface is 45 LoC and the rest of the server eapp is 60 LoC. e generic host is 270 LoC and the remote client is 280 LoC. Keystone takes 45K cycles for a roundtrip with an empty message which includes secure channel and message passing overheads. It takes 47K cycles between the host ge ing a message and enclave notifying the host to send a reply.
Related Work
Keystone is the rst framework for customizing TEEs, equivalent to research areas such as SDNs [31] , microkernels [46] , and library OSes [88] . Here, we survey existing TEEs.
TEE Architecture Extensions. Several TEEs have been introduced by multiple vendors and researchers for protection against untrusted OSes, of which three major TEEs are directly related to Keystone: (a) Intel So ware Guard Extension (SGX) executes user-level code in an isolated virtual address space backed by encrypted RAM pages [72] ; (b) ARM TrustZone divides the memory into two worlds (i.e., normal vs. secure) to run applications in protected memory [17] ; and (c) Sanctum uses the memory management unit (MMU) and cache partitioning to isolate memory and prevent cache side-channel a acks [41] . Apart from these, several TEEs explore designs options at various layers such as hypervisors [23, 35, 36, 39, 49, 53, 58, 66, 69, 94, 99, 101] , physical memory [26, 33, 63, 65, 70, 79, 86] , virtual memory [38, 42, 47, 87] , and process isolation [43, 50, 83, 85, 93, 95] .
Re-purposing Existing TEEs for Modularity. One way to meet Keystone design goals is to reuse the TEE solutions available on commodity CPUs. For each of these TEEs, it is possible to enable a subset of programming constructs (e.g., threading, dynamic loading of binaries) by including a so ware management component inside the enclave [5, 9, 11, 24, 34] . On the other hand, all of these are hardware extensions which are designed and implemented by the CPU manufacturer. us, they do not allow users to access the programmable interface at a layer underneath the untrusted OS. One way to simulate the programmable layer is by adding a trusted hypervisor layer which then executes an untrusted OS, but it in ates the TCB. Lastly, none of these potential designs allow for adapting to threat models and workloads. TEE Programming Support. Previous works add expressiveness to TEE platforms. At the SM layer they optimize program-critical tasks (e.g., context switches, memory operations) [22, 41, 87] , at RT they target portability, functionality, and / or security [4, 5, 11, 18, 24, 34, 51, 74, 76, 90, 97] , and at eapp layer they reduce the developers e orts for commonly used primitives [20, 21] . Although these systems are a xed con guration in the TEE design space, they provide valuable lessons for Keystone feature design and future optimization.
Enhancing Security of TEEs. Be er and secure TEE design has been a long-standing goal, with advocacy for security-bydesign [55, 82] . We point out that Keystone is not vulnerable to a large class of side-channel a acks [30, 98] by design, while speculative execution a acks [29, 62] are limited to outof-order RISC-V cores (e.g., BOOM) and do not a ect most SOC implementations (e.g., Rocket). Keystone can re-use known cache side-channel defenses [26, 60] as we demonstrated in Section 4.3. Lastly, Keystone can bene t from various RISC-V proposals underway to secure IO operations with PMP [80] . us, Keystone either eliminates classes of a acks or allows integration with existing techniques.
Conclusion
We present Keystone, the rst framework for customizable TEEs. With our modular design, we showcase the use of Keystone for several standard benchmarks and applications on illustrative RTs and various deployments platforms.
