Search CORE

14 research outputs found

I/O Is Faster Than the CPU - Let's Partition Resources and Eliminate (Most) OS Abstractions

Author: Bailey Katelin
Firestone Daniel
Honda Michio
Hruby Tomas
Kalia Anuj
Kicinski Jakub
Kim Hyeong-Jun
Lim Hyeontaek
McCanne Steven
Mogul Jeffrey C.
Peter Simon
Rizzo Luigi
Shan Yizhou
Publication venue: ACM
Publication date: 01/01/2019
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Unikernels: the next stage of Linux’s dominance

Author: Appavoo Jonathan
Cadden James
Drepper Ulrich
Jones Richard
Krieger Orran
Mancuso Renato
Raza Ali
Sohal Parul
Woodman Larry
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/05/2019
Field of study

Unikernels have demonstrated enormous advantages over Linux in many important domains, causing some to propose that the days of Linux's dominance may be coming to an end. On the contrary, we believe that unikernels' advantages represent the next natural evolution for Linux, as it can adopt the best ideas from the unikernel approach and, along with its battle-tested codebase and large open source community, continue to dominate. In this paper, we posit that an upstreamable unikernel target is achievable from the Linux kernel, and, through an early Linux unikernel prototype, demonstrate that some simple changes can bring dramatic performance advantages.Accepted manuscrip

Boston University Institutional Repository (OpenBU)

Proving the Absence of Microarchitectural Timing Channels

Author: Buckley Scott
Heiser Gernot
Klein Gerwin
Millar Curtis
Murray Toby
Sison Robert
Wistoff Nils
Publication venue
Publication date: 25/10/2023
Field of study

Microarchitectural timing channels are a major threat to computer security. A set of OS mechanisms called time protection was recently proposed as a principled way of preventing information leakage through such channels and prototyped in the seL4 microkernel. We formalise time protection and the underlying hardware mechanisms in a way that allows linking them to the information-flow proofs that showed the absence of storage channels in seL4.Comment: Scott Buckley and Robert Sison were joint lead author

arXiv.org e-Print Archive

An Adaptive Resilience Testing Framework for Microservice Systems

Author: Lee Cheryl
Lyu Michael R.
Shen Jiacheng
Su Yuxin
Yang Tianyi
Yang Yongqiang
Publication venue
Publication date: 24/12/2022
Field of study

Resilience testing, which measures the ability to minimize service degradation caused by unexpected failures, is crucial for microservice systems. The current practice for resilience testing relies on manually defining rules for different microservice systems. Due to the diverse business logic of microservices, there are no one-size-fits-all microservice resilience testing rules. As the quantity and dynamic of microservices and failures largely increase, manual configuration exhibits its scalability and adaptivity issues. To overcome the two issues, we empirically compare the impacts of common failures in the resilient and unresilient deployments of a benchmark microservice system. Our study demonstrates that the resilient deployment can block the propagation of degradation from system performance metrics (e.g., memory usage) to business metrics (e.g., response latency). In this paper, we propose AVERT, the first AdaptiVE Resilience Testing framework for microservice systems. AVERT first injects failures into microservices and collects available monitoring metrics. Then AVERT ranks all the monitoring metrics according to their contributions to the overall service degradation caused by the injected failures. Lastly, AVERT produces a resilience index by how much the degradation in system performance metrics propagates to the degradation in business metrics. The higher the degradation propagation, the lower the resilience of the microservice system. We evaluate AVERT on two open-source benchmark microservice systems. The experimental results show that AVERT can accurately and efficiently test the resilience of microservice systems

arXiv.org e-Print Archive

Memory-Aware Functional IR for Higher-Level Synthesis of Accelerators

Author: Dubach Christophe
Juang Tzung-Han
Schlaak Christof
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2022
Field of study

Edinburgh Research Explorer

DINOMO: An Elastic, Scalable, High-Performance Key-Value Store for Disaggregated Persistent Memory (Extended Version)

Author: Aguilera Marcos K.
Chidambaram Vijay
Keeton Kimberly
Lee Sekwon
Ponnapalli Soujanya
Singhal Sharad
Publication venue
Publication date: 18/09/2022
Field of study

We present Dinomo, a novel key-value store for disaggregated persistent memory (DPM). Dinomo is the first key-value store for DPM that simultaneously achieves high common-case performance, scalability, and lightweight online reconfiguration. We observe that previously proposed key-value stores for DPM had architectural limitations that prevent them from achieving all three goals simultaneously. Dinomo uses a novel combination of techniques such as ownership partitioning, disaggregated adaptive caching, selective replication, and lock-free and log-free indexing to achieve these goals. Compared to a state-of-the-art DPM key-value store, Dinomo achieves at least 3.8x better throughput on various workloads at scale and higher scalability, while providing fast reconfiguration.Comment: This is an extended version of the full paper to appear in PVLDB 15.13 (VLDB 2023

arXiv.org e-Print Archive

DLAS: An Exploration and Assessment of the Deep Learning Acceleration Stack

Author: Cano José
Crowley Elliot J.
Gibson Perry
O'Boyle Michael
Storkey Amos
Publication venue
Publication date: 15/11/2023
Field of study

Deep Neural Networks (DNNs) are extremely computationally demanding, which presents a large barrier to their deployment on resource-constrained devices. Since such devices are where many emerging deep learning applications lie (e.g., drones, vision-based medical technology), significant bodies of work from both the machine learning and systems communities have attempted to provide optimizations to accelerate DNNs. To help unify these two perspectives, in this paper we combine machine learning and systems techniques within the Deep Learning Acceleration Stack (DLAS), and demonstrate how these layers can be tightly dependent on each other with an across-stack perturbation study. We evaluate the impact on accuracy and inference time when varying different parameters of DLAS across two datasets, seven popular DNN architectures, four DNN compression techniques, three algorithmic primitives with sparse and dense variants, untuned and auto-scheduled code generation, and four hardware platforms. Our evaluation highlights how perturbations across DLAS parameters can cause significant variation and across-stack interactions. The highest level observation from our evaluation is that the model size, accuracy, and inference time are not guaranteed to be correlated. Overall we make 13 key observations, including that speedups provided by compression techniques are very hardware dependent, and that compiler auto-tuning can significantly alter what the best algorithm to use for a given configuration is. With DLAS, we aim to provide a reference framework to aid machine learning and systems practitioners in reasoning about the context in which their respective DNN acceleration solutions exist in. With our evaluation strongly motivating the need for co-design, we believe that DLAS can be a valuable concept for exploring the next generation of co-designed accelerated deep learning solutions

arXiv.org e-Print Archive

ENHANCING CLOUD SYSTEM RUNTIME TO ADDRESS COMPLEX FAILURES

Author: Lou Chang
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 03/01/2024
Field of study

As the reliance on cloud systems intensifies in our progressively digital world, understanding and reinforcing their reliability becomes more crucial than ever. Despite impressive advancements in augmenting the resilience of cloud systems, the growing incidence of complex failures now poses a substantial challenge to the availability of these systems. With cloud systems continuing to scale and increase in complexity, failures not only become more elusive to detect but can also lead to more catastrophic consequences. Such failures question the foundational premises of conventional fault-tolerance designs, necessitating the creation of novel system designs to counteract them. This dissertation aims to enhance distributed systems’ capabilities to detect, localize, and react to complex failures at runtime. To this end, this dissertation makes contributions to address three emerging categories of failures in cloud systems. The first part delves into the investigation of partial failures, introducing OmegaGen, a tool adept at generating tailored checkers for detecting and localizing such failures. The second part grapples with silent semantic failures prevalent in cloud systems, showcasing our study findings, and introducing Oathkeeper, a tool that leverages past failures to infer rules and expose these silent issues. The third part explores solutions to slow failures via RESIN, a framework specifically designed to detect, diagnose, and mitigate memory leaks in cloud-scale infrastructures, developed in collaboration with Microsoft Azure. The dissertation concludes by offering insights into future directions for the construction of reliable cloud systems

Johns Hopkins University

Systems Support for Trusted Execution Environments

Author: Trach Bohdan
Publication venue
Publication date: 09/02/2022
Field of study

Cloud computing has become a default choice for data processing by both large corporations and individuals due to its economy of scale and ease of system management. However, the question of trust and trustoworthy computing inside the Cloud environments has been long neglected in practice and further exacerbated by the proliferation of AI and its use for processing of sensitive user data. Attempts to implement the mechanisms for trustworthy computing in the cloud have previously remained theoretical due to lack of hardware primitives in the commodity CPUs, while a combination of Secure Boot, TPMs, and virtualization has seen only limited adoption. The situation has changed in 2016, when Intel introduced the Software Guard Extensions (SGX) and its enclaves to the x86 ISA CPUs: for the first time, it became possible to build trustworthy applications relying on a commonly available technology. However, Intel SGX posed challenges to the practitioners who discovered the limitations of this technology, from the limited support of legacy applications and integration of SGX enclaves into the existing system, to the performance bottlenecks on communication, startup, and memory utilization. In this thesis, our goal is enable trustworthy computing in the cloud by relying on the imperfect SGX promitives. To this end, we develop and evaluate solutions to issues stemming from limited systems support of Intel SGX: we investigate the mechanisms for runtime support of POSIX applications with SCONE, an efficient SGX runtime library developed with performance limitations of SGX in mind. We further develop this topic with FFQ, which is a concurrent queue for SCONE's asynchronous system call interface. ShieldBox is our study of interplay of kernel bypass and trusted execution technologies for NFV, which also tackles the problem of low-latency clocks inside enclave. The two last systems, Clemmys and T-Lease are built on a more recent SGXv2 ISA extension. In Clemmys, SGXv2 allows us to significantly reduce the startup time of SGX-enabled functions inside a Function-as-a-Service platform. Finally, in T-Lease we solve the problem of trusted time by introducing a trusted lease primitive for distributed systems. We perform evaluation of all of these systems and prove that they can be practically utilized in existing systems with minimal overhead, and can be combined with both legacy systems and other SGX-based solutions. In the course of the thesis, we enable trusted computing for individual applications, high-performance network functions, and distributed computing framework, making a <vision of trusted cloud computing a reality

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa