25 research outputs found

    Systems Support for Trusted Execution Environments

    Get PDF
    Cloud computing has become a default choice for data processing by both large corporations and individuals due to its economy of scale and ease of system management. However, the question of trust and trustoworthy computing inside the Cloud environments has been long neglected in practice and further exacerbated by the proliferation of AI and its use for processing of sensitive user data. Attempts to implement the mechanisms for trustworthy computing in the cloud have previously remained theoretical due to lack of hardware primitives in the commodity CPUs, while a combination of Secure Boot, TPMs, and virtualization has seen only limited adoption. The situation has changed in 2016, when Intel introduced the Software Guard Extensions (SGX) and its enclaves to the x86 ISA CPUs: for the first time, it became possible to build trustworthy applications relying on a commonly available technology. However, Intel SGX posed challenges to the practitioners who discovered the limitations of this technology, from the limited support of legacy applications and integration of SGX enclaves into the existing system, to the performance bottlenecks on communication, startup, and memory utilization. In this thesis, our goal is enable trustworthy computing in the cloud by relying on the imperfect SGX promitives. To this end, we develop and evaluate solutions to issues stemming from limited systems support of Intel SGX: we investigate the mechanisms for runtime support of POSIX applications with SCONE, an efficient SGX runtime library developed with performance limitations of SGX in mind. We further develop this topic with FFQ, which is a concurrent queue for SCONE's asynchronous system call interface. ShieldBox is our study of interplay of kernel bypass and trusted execution technologies for NFV, which also tackles the problem of low-latency clocks inside enclave. The two last systems, Clemmys and T-Lease are built on a more recent SGXv2 ISA extension. In Clemmys, SGXv2 allows us to significantly reduce the startup time of SGX-enabled functions inside a Function-as-a-Service platform. Finally, in T-Lease we solve the problem of trusted time by introducing a trusted lease primitive for distributed systems. We perform evaluation of all of these systems and prove that they can be practically utilized in existing systems with minimal overhead, and can be combined with both legacy systems and other SGX-based solutions. In the course of the thesis, we enable trusted computing for individual applications, high-performance network functions, and distributed computing framework, making a <vision of trusted cloud computing a reality

    Execution Environments for Running Legacy Applications in Multi-Party Trust Settings

    Get PDF
    Applications often assume that the same party owns all of the application’s resources, and that these resources require the same level of privacy. This assumption no longer holds when organizations outsource applications to a third-party cloud, or when the application requires access to not only public content, but private configuration, such as authentication and keying material. The result of this broken assumption is that applications either must be re-written to accommodate each new security posture, or used as-is, accepting that one party exposes private data to another. In this dissertation, I argue the following thesis: it is possible to run legacy application binaries with confidentiality and integrity guarantees that reflect a multi-party trust setting. I support this thesis through the design, implementation, and evaluation of two distinct application-level virtualization layers that handle trust concerns on behalf of the application: conclaves and SecureMigration. Conclaves assume the availability of Intel SGX secure hardware enclaves and extend prior work in developing runtimes that execute legacy applications within an enclave. In contrast, SecureMigration does not use secure hardware, but rather composes information flow control with process migration to execute a process across multiple physical machines owned and operated by distinct principals, while shielding each principal’s sensitive portion of the process from its peers

    Event-driven servers using asynchronous, non-blocking network I/O: Performance evaluation of kqueue and epoll

    Get PDF
    This research project evaluates the performance of kqueue and epoll in the context of event-driven servers. The evaluation is done through benchmarking and tracing which are used to measure throughput and execution time respectively. The experiment is repeated for both a virtualised and native server environment. The results from the experiment are statistically analysed and compared. These results show significant differences between kqueue and epoll, and a profound impact of virtualisation as a variable

    Enabling Hyperscale Web Services

    Full text link
    Modern web services such as social media, online messaging, web search, video streaming, and online banking often support billions of users, requiring data centers that scale to hundreds of thousands of servers, i.e., hyperscale. In fact, the world continues to expect hyperscale computing to drive more futuristic applications such as virtual reality, self-driving cars, conversational AI, and the Internet of Things. This dissertation presents technologies that will enable tomorrow’s web services to meet the world’s expectations. The key challenge in enabling hyperscale web services arises from two important trends. First, over the past few years, there has been a radical shift in hyperscale computing due to an unprecedented growth in data, users, and web service software functionality. Second, modern hardware can no longer support this growth in hyperscale trends due to a decline in hardware performance scaling. To enable this new hyperscale era, hardware architects must become more aware of hyperscale software needs and software researchers can no longer expect unlimited hardware performance scaling. In short, systems researchers can no longer follow the traditional approach of building each layer of the systems stack separately. Instead, they must rethink the synergy between the software and hardware worlds from the ground up. This dissertation establishes such a synergy to enable futuristic hyperscale web services. This dissertation bridges the software and hardware worlds, demonstrating the importance of that bridge in realizing efficient hyperscale web services via solutions that span the systems stack. The specific goal is to design software that is aware of new hardware constraints and architect hardware that efficiently supports new hyperscale software requirements. This dissertation spans two broad thrusts: (1) a software and (2) a hardware thrust to analyze the complex hyperscale design space and use insights from these analyses to design efficient cross-stack solutions for hyperscale computation. In the software thrust, this dissertation contributes uSuite, the first open-source benchmark suite of web services built with a new hyperscale software paradigm, that is used in academia and industry to study hyperscale behaviors. Next, this dissertation uses uSuite to study software threading implications in light of today’s hardware reality, identifying new insights in the age-old research area of software threading. Driven by these insights, this dissertation demonstrates how threading models must be redesigned at hyperscale by presenting an automated approach and tool, uTune, that makes intelligent run-time threading decisions. In the hardware thrust, this dissertation architects both commodity and custom hardware to efficiently support hyperscale software requirements. First, this dissertation characterizes commodity hardware’s shortcomings, revealing insights that influenced commercial CPU designs. Based on these insights, this dissertation presents an approach and tool, SoftSKU, that enables cheap commodity hardware to efficiently support new hyperscale software paradigms, improving the efficiency of real-world web services that serve billions of users, saving millions of dollars, and meaningfully reducing the global carbon footprint. This dissertation also presents a hardware-software co-design, uNotify, that redesigns commodity hardware with minimal modifications by using existing hardware mechanisms more intelligently to overcome new hyperscale overheads. Next, this dissertation characterizes how custom hardware must be designed at hyperscale, resulting in industry-academia benchmarking efforts, commercial hardware changes, and improved software development. Based on this characterization’s insights, this dissertation presents Accelerometer, an analytical model that estimates gains from hardware customization. Multiple hyperscale enterprises and hardware vendors use Accelerometer to make well-informed hardware decisions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169802/1/akshitha_1.pd

    Automating Seccomp Filter Generation for Linux Applications

    Get PDF
    Software vulnerabilities in applications undermine the security of applications. By blocking unused functionality, the impact of potential exploits can be reduced. While seccomp provides a solution for filtering syscalls, it requires manual implementation of filter rules for each individual application. Recent work has investigated automated approaches for detecting and installing the necessary filter rules. However, as we show, these approaches make assumptions that are not necessary or require overly time-consuming analysis. In this paper, we propose Chestnut, an automated approach for generating strict syscall filters for Linux userspace applications with lower requirements and limitations. Chestnut comprises two phases, with the first phase consisting of two static components, i.e., a compiler and a binary analyzer, that extract the used syscalls during compilation or in an analysis of the binary. The compiler-based approach of Chestnut is up to factor 73 faster than previous approaches without affecting the accuracy adversely. On the binary analysis level, we demonstrate that the requirement of position-independent binaries of related work is not needed, enlarging the set of applications for which Chestnut is usable. In an optional second phase, Chestnut provides a dynamic refinement tool that allows restricting the set of allowed syscalls further. We demonstrate that Chestnut on average blocks 302 syscalls (86.5%) via the compiler and 288 (82.5%) using the binary-level analysis on a set of 18 widely used applications. We found that Chestnut blocks the dangerous exec syscall in 50% and 77.7% of the tested applications using the compiler- and binary-based approach, respectively. For the tested applications, Chestnut prevents exploitation of more than 62% of the 175 CVEs that target the kernel via syscalls. Finally, we perform a 6 month long-term study of a sandboxed Nginx server

    High Performance Web Servers: A Study In Concurrent Programming Models

    Get PDF
    With the advent of commodity large-scale multi-core computers, the performance of software running on these computers has become a challenge to researchers and enterprise developers. While academic research and industrial products have moved in the direction of writing scalable and highly available services using distributed computing, single machine performance remains an active domain, one which is far from saturated. This thesis selects an archetypal software example and workload in this domain, and describes software characteristics affecting performance. The example is highly-parallel web-servers processing a static workload. Particularly, this work examines concurrent programming models in the context of high-performance web-servers across different architectures — threaded (Apache, Go and μKnot), event-driven (Nginx, μServer) and staged (WatPipe) — compared with two static workloads in two different domains. The two workloads are a Zipf distribution of file sizes representing a user session pulling an assortment of many small and a few large files, and a 50KB file representing chunked streaming of a large audio or video file. Significant effort is made to fairly compare eight web-servers by carefully tuning each via their adjustment parameters. Tuning plays a significant role in workload-specific performance. The two domains are no disk I/O (in-memory file set) and medium disk I/O. The domains are created by lowering the amount of RAM available to the web-server from 4GB to 2GB, forcing files to be evicted from the file-system cache. Both domains are also restricted to 4 CPUs. The primary goal of this thesis is to examine fundamental performance differences between threaded and event-driven concurrency models, with particular emphasis on user-level threading models. Additionally, a secondary goal of the work is to examine high-performance software under restricted hardware environments. Over-provisioned hardware environments can mask architectural and implementation shortcomings in software – the hypothesis in this work is that restricting resources stresses the application, bringing out important performance characteristics and properties. Experimental results for the given workload show that memory pressure is one of the most significant factors for the degradation of web-server performance, because it forces both the onset and amount of disk I/O. With an ever increasing need to support more content at faster rates, a web-server relies heavily on in-memory caching of files and related content. In fact, personal and small business web-servers are even run on minimal hardware, like the Raspberry Pi, with only 1GB of RAM and a small SD card for the file system. Therefore, understanding behaviour and performance in restricted contexts should be a normal aspect of testing a web server (and other software systems)

    A New System Architecture for Heterogeneous Compute Units

    Get PDF
    The ongoing trend to more heterogeneous systems forces us to rethink the design of systems. In this work, I study a new system design that considers heterogeneous compute units (general-purpose cores with different instruction sets, DSPs, FPGAs, fixed-function accelerators, etc.) from the beginning instead of as an afterthought. The goal is to treat all compute units (CUs) as first-class citizens, enabling (1) isolation and secure communication between all types of CUs, (2) a direct interaction of all CUs, removing the conventional CPU from the critical path, and (3) access to operating system (OS) services such as file systems and network stacks for all CUs. To study this system design, I am using a hardware/software co-design based on two key ideas: 1) introduce a new hardware component next to each CU used by the OS as the CUs' common interface and 2) let the OS kernel control applications remotely from a different CU. The hardware component is called data transfer unit (DTU) and offers the minimal set of features to reach the stated goals: secure message passing and memory access. The OS is called MÂł and runs its kernel on a dedicated CU and runs the OS services and applications on the remaining CUs. The kernel is responsible for establishing DTU-based communication channels between services and applications. After a channel has been set up, services and applications communicate directly without involving the kernel. This approach allows to support arbitrary CUs as aforementioned first-class citizens, ranging from fixed-function accelerators to complex general-purpose cores
    corecore