75 research outputs found
A New System Architecture for Heterogeneous Compute Units
The ongoing trend to more heterogeneous systems forces us to rethink the design of systems. In this work, I study a new system design that considers heterogeneous compute units (general-purpose cores with different instruction sets, DSPs, FPGAs, fixed-function accelerators, etc.) from the beginning instead of as an afterthought. The goal is to treat all compute units (CUs) as first-class citizens, enabling (1) isolation and secure communication between all types of CUs, (2) a direct interaction of all CUs, removing the conventional CPU from the critical path, and (3) access to operating system (OS) services such as file systems and network stacks for all CUs.
To study this system design, I am using a hardware/software co-design based on two key ideas: 1) introduce a new hardware component next to each CU used by the OS as the CUs' common interface and 2) let the OS kernel control applications remotely from a different CU. The hardware component is called data transfer unit (DTU) and offers the minimal set of features to reach the stated goals: secure message passing and memory access. The OS is called M³ and runs its kernel on a dedicated CU and runs the OS services and applications on the remaining CUs. The kernel is responsible for establishing DTU-based communication channels between services and applications. After a channel has been set up, services and applications communicate directly without involving
the kernel. This approach allows to support arbitrary CUs as aforementioned first-class citizens, ranging from fixed-function accelerators to complex general-purpose cores
Managing Access Control in Virtual Private Networks
Virtual Private Network technology allows remote network users to benefit from resources on a private network as if their host machines actually resided on the network. However, each resource on a network may also have its own access control policies, which may be completely unrelated to network access. Thus users� access to a network (even by VPN technology) does not guarantee their access to the sought resources. With the introduction of more complicated access privileges, such as delegated access, it is conceivable for a scenario to arise where a user can access a network remotely (because of direct permissions from the network administrator or by delegated permission) but cannot access any resources on the network. There is, therefore, a need for a network access control mechanism that understands the privileges of each remote network user on one hand, and the access control policies of various network resources on the other hand, and so can aid a remote user in accessing these resources based on the user\u27s privileges. This research presents a software solution in the form of a centralized access control framework called an Access Control Service (ACS), that can grant remote users network presence and simultaneously aid them in accessing various network resources with varying access control policies. At the same time, the ACS provides a centralized framework for administrators to manage access to their resources. The ACS achieves these objectives using VPN technology, network address translation and by proxying various authentication protocols on behalf of remote users
Systems Support for Trusted Execution Environments
Cloud computing has become a default choice for data processing by both large corporations and individuals due to its economy of scale and ease of system management. However, the question of trust and trustoworthy computing inside the Cloud environments has been long neglected in practice and further exacerbated by the proliferation of AI and its use for processing of sensitive user data. Attempts to implement the mechanisms for trustworthy computing in the cloud have previously remained theoretical due to lack of hardware primitives in the commodity CPUs, while a combination of Secure Boot, TPMs, and virtualization has seen only limited adoption. The situation has changed in 2016, when Intel introduced the Software Guard Extensions (SGX) and its enclaves to the x86 ISA CPUs: for the first time, it became possible to build trustworthy applications relying on a commonly available technology. However, Intel SGX posed challenges to the practitioners who discovered the limitations of this technology, from the limited support of legacy applications and integration of SGX enclaves into the existing system, to the performance bottlenecks on communication, startup, and memory utilization. In this thesis, our goal is enable trustworthy computing in the cloud by relying on the imperfect SGX promitives. To this end, we develop and evaluate solutions to issues stemming from limited systems support of Intel SGX: we investigate the mechanisms for runtime support of POSIX applications with SCONE, an efficient SGX runtime library developed with performance limitations of SGX in mind. We further develop this topic with FFQ, which is a concurrent queue for SCONE's asynchronous system call interface. ShieldBox is our study of interplay of kernel bypass and trusted execution technologies for NFV, which also tackles the problem of low-latency clocks inside enclave. The two last systems, Clemmys and T-Lease are built on a more recent SGXv2 ISA extension. In Clemmys, SGXv2 allows us to significantly reduce the startup time of SGX-enabled functions inside a Function-as-a-Service platform. Finally, in T-Lease we solve the problem of trusted time by introducing a trusted lease primitive for distributed systems. We perform evaluation of all of these systems and prove that they can be practically utilized in existing systems with minimal overhead, and can be combined with both legacy systems and other SGX-based solutions. In the course of the thesis, we enable trusted computing for individual applications, high-performance network functions, and distributed computing framework, making a <vision of trusted cloud computing a reality
Cost-Aware Resource Management for Decentralized Internet Services
Decentralized network services, such as naming systems, content
distribution networks, and publish-subscribe systems, play an
increasingly critical role and are required to provide high
performance, low latency service, achieve high availability in the
presence of network and node failures, and handle a large volume
of users. Judicious utilization of expensive system resources,
such as memory space, network bandwidth, and number of machines,
is fundamental to achieving the above properties. Yet, current
network services typically rely on less-informed, heuristic-based
techniques to manage scarce resources, and often fall short of
expectations.
This thesis presents a principled approach for building high
performance, robust, and scalable network services. The key
contribution of this thesis is to show that resolving the
fundamental cost-benefit tradeoff between resource consumption and
performance through mathematical optimization is practical in
large-scale distributed systems, and enables decentralized network
services to meet efficiently system-wide performance goals. This
thesis presents a practical approach for resource management in
three stages: analytically model the cost-benefit tradeoff as a
constrained optimization problem, determine a near-optimal
resource allocation strategy on the fly, and enforce the derived
strategy through light-weight, decentralized mechanisms. It
builds on self-organizing structured overlays, which provide
failure resilience and scalability, and complements them with
stronger performance guarantees and robustness under sudden
changes in workload. This work enables applications to meet
system-wide performance targets, such as low average response
times, high cache hit rates, and small update dissemination times
with low resource consumption. Alternatively, applications can
make the maximum use of available resources, such as storage and
bandwidth, and derive large gains in performance.
I have implemented an extensible framework called Honeycomb to
perform cost-aware resource management on structured overlays
based on the above approach and built three critical network
services using it. These services consist of a new name system for
the Internet called CoDoNS that distributes data associated with
domain names, an open-access content distribution network called
CobWeb that caches web content for faster access by users, and an
online information monitoring system called Corona that notifies
users about changes to web pages. Simulations and performance
measurements from a planetary-scale deployment show that these
services provide unprecedented performance improvement over the
current state of the art
High Availability and Scalability of Mainframe Environments using System z and z/OS as example
Mainframe computers are the backbone of industrial and commercial computing, hosting the most relevant and critical data of businesses. One of the most important mainframe environments is IBM System z with the operating system z/OS. This book introduces mainframe technology of System z and z/OS with respect to high availability and scalability. It highlights their presence on different levels within the hardware and software stack to satisfy the needs for large IT organizations
Recommended from our members
CheriOS: Designing an untrusted single-address-space capability operating system utilising capability hardware and a minimal hypervisor
This thesis presents the design, implementation, and evaluation of a novel capability operating system: CheriOS. The guiding motivation behind CheriOS is to provide strong security guarantees to programmers, even allowing them to continue to program in fast, but typically unsafe, languages such as C. Furthermore, it does this in the presence of an extremely strong adversarial model: in CheriOS, every compartment -- and even the operating system itself -- is considered actively malicious. Building on top of the architecturally enforced capabilities offered by the CHERI microprocessor, I show that only a few more capability types and enforcement checks are required to provide a strong compartmentalisation model that can facilitate mutual distrust. I implement these new primitives in software, in a new abstraction layer I dub the nanokernel. Among the new OS primitives I introduce are one for integrity and confidentiality called a Reservation (which allows allocating private memory without trusting the allocator), as well as another that can provide attestation about the state of the system, a Foundation (which provides a key to sign and protect capabilities based on a signature of the starting state of a program). I show that, using these new facilities, it is possible to design an operating system without having to trust the implementation is correct.
CheriOS is fundamentally fail-safe; there are no assumptions about the behaviour of the system, apart from the CHERI processor and the nanokernel, to be broken. Using CHERI and the new nanokernel primitives, programmers can expect full isolation at scopes ranging from a whole program to a single function, and not just with respect to other programs but the system itself. Programs compiled for and run on CheriOS offer full memory safety, both spatial and temporal, enforced control flow integrity between compartments and protection against common vulnerabilities such as buffer overflows, code injection and Return-Oriented-Programming attacks. I achieve this by designing a new CHERI-based ABI (Application Binary Interface) which includes a novel stack structure that offers temporal safety. I evaluate how practical the new designs are by prototyping them and offering a detailed performance evaluation. I also contrast with existing offerings from both industry and academia.
CHERI capabilities can be used to restrict access to system resources, such as memory, with the required dynamic checks being performed by hardware in parallel with normal operation. Using the accelerating features of CHERI, I show that many of the security guarantees that CheriOS offers can come at little to no cost. I present a novel and secure IO/IPC layer that allows secure marshalling of multiple data streams through mutually distrusting compartments, with fine-grained authenticated access control for end-points, and without either copying or encryption. For example, CheriOS can restrict its TCP stack from having access to packet contents, or restrict an open socket to ensure data sent on it to arrives at an endpoint signed as a TLS implementation. Even with added security requirements, CheriOS can perform well on real workloads. I showcase this by running a state-of-the-art webserver, NGINX, atop both CheriOS and FreeBSD and show improvements in performance ranging from 3x to 6x when running on a small-scale low-power FPGA implementation of CHERI-MIPS
TREDIS – A Trusted Full-Fledged SGX-Enabled REDIS Solution
Currently, offloading storage and processing capacity to cloud servers is a growing
trend among web-enabled services managing big datasets. This happens because high
storage capacity and powerful processors are expensive, whilst cloud services provide
cheaper, ongoing, elastic, and reliable solutions. The problem with this cloud-based out sourced solutions are that they are highly accessible through the Internet, which is good,
but therefore can be considerably exposed to attacks, out of users’ control. By exploring
subtle vulnerabilities present in cloud-enabled applications, management functions, op erating systems and hypervisors, an attacker may compromise the supported systems,
thus compromising the privacy of sensitive user data hosted and managed in it. These
attacks can be motivated by malicious purposes such as espionage, blackmail, identity
theft, or harassment. A solution to this problem is processing data without exposing it to
untrusted components, such as vulnerable OS components, which might be compromised
by an attacker.
In this thesis, we do a research on existent technologies capable of enabling appli cations to trusted environments, in order to adopt such approaches to our solution as a
way to help deploy unmodified applications on top of Intel-SGX, with overheads com parable to applications designed to use this kind of technology, and also conducting an
experimental evaluation to better understand how they impact our system. Thus, we
present TREDIS - a Trusted Full-Fledged REDIS Key-Value Store solution, implemented
as a full-fledged solution to be offered as a Trusted Cloud-enabled Platform as a Service,
which includes the possibility to support a secure REDIS-cluster architecture supported
by docker-virtualized services running in SGX-enabled instances, with operations run ning on always-encrypted in-memory datasets.A transição de suporte de aplicações com armazenamento e processamento em servidores
cloud é uma tendência que tem vindo a aumentar, principalmente quando se precisam
de gerir grandes conjuntos de dados. Comparativamente a soluções com licenciamento
privado, as soluções de computação e armazenamento de dados em nuvens de serviços
são capazes de oferecer opções mais baratas, de alta disponibilidade, elásticas e relativa mente confiáveis. Estas soluções fornecidas por terceiros são facilmente acessÃveis através
da Internet, sendo operadas em regime de outsourcing da sua operação, o que é bom, mas
que por isso ficam consideravelmente expostos a ataques e fora do controle dos utiliza dores em relação às reais condições de confiabilidade, segurança e privacidade de dados.
Ao explorar subtilmente vulnerabilidades presentes nas aplicações, funções de sistemas
operativos (SOs), bibliotecas de virtualização de serviços de SOs ou hipervisores, um ata cante pode comprometer os sistemas e quebrar a privacidade de dados sensÃveis. Estes
ataques podem ser motivados por fins maliciosos como espionagem, chantagem, roubo
de identidade ou assédio e podem ser desencadeados por intrusões (a partir de atacantes
externos) ou por ações maliciosas ou incorretas de atacantes internos (podendo estes atuar
com privilégios de administradores de sistemas). Uma solução para este problema passa
por armazenar e processar a informação sem que existam exposições face a componentes
não confiáveis.
Nesta dissertação estudamos e avaliamos experimentalmente diversas tecnologias que
permitem a execução de aplicações com isolamento em ambientes de execução confiá vel suportados em hardware Intel-SGX, de modo a perceber melhor como funcionam e
como adaptá-las à nossa solução. Para isso, realizámos uma avaliação focada na utilização
dessas tecnologias com virtualização em contentores isolados executando em hardware
confiável, que usámos na concepção da nossa solução. Posto isto, apresentamos a nossa
solução TREDIS - um sistema Key-Value Store confiável baseado em tecnologia REDIS,
com garantias de integridade da execução e de privacidade de dados, concebida para
ser usada como uma "Plataforma como Serviço"para gestão e armazenamento resiliente
de dados na nuvem. Isto inclui a possibilidade de suportar uma arquitetura segura com
garantias de resiliência semelhantes à arquitetura de replicação em cluster na solução
original REDIS, mas em que os motores de execução de nós e a proteção de memória
do cluster é baseado em contentores docker isolados e virtualizados em instâncias SGX, sendo os dados mantidos sempre cifrados em memória
Recommended from our members
Operating system support for warehouse-scale computing
Modern applications are increasingly backed by large-scale data centres. Systems software in these data centre environments, however, faces substantial challenges: the lack of uniform resource abstractions makes sharing and resource management inefficient, infrastructure software lacks end-to-end access control mechanisms, and work placement ignores the effects of hardware heterogeneity and workload interference.
In this dissertation, I argue that uniform, clean-slate operating system (OS) abstractions designed to support distributed systems can make data centres more efficient and secure. I present a novel distributed operating system for data centres, focusing on two OS components: the abstractions for resource naming, management and protection, and the scheduling of work to compute resources.
First, I introduce a reference model for a decentralised, distributed data centre OS, based on pervasive distributed objects and inspired by concepts in classic 1980s distributed OSes. Translucent abstractions free users from having to understand implementation details, but enable introspection for performance optimisation. Fine-grained access control is supported by combining
storable, communicable identifier capabilities, and context-dependent, ephemeral handle capabilities. Finally, multi-phase I/O requests implement optimistically concurrent access to objects
while supporting diverse application-level consistency policies.
Second, I present the DIOS operating system, an implementation of my model as an extension to Linux. The DIOS system call API is centred around distributed objects, globally resolvable names, and translucent references that carry context-sensitive object meta-data. I illustrate how these concepts support distributed applications, and evaluate the performance of DIOS in microbenchmarks and a data-intensive MapReduce application. I find that it offers improved, finegrained isolation of resources, while permitting flexible sharing.
Third, I present the Firmament cluster scheduler, which generalises prior work on scheduling via minimum-cost flow optimisation. Firmament can flexibly express many scheduling policies using pluggable cost models; it makes high-quality placement decisions based on fine-grained information about tasks and resources; and it scales the flow-based scheduling approach to very large clusters. In two case studies, I show that Firmament supports policies that reduce colocation interference between tasks and that it successfully exploits flexibility in the workload to improve the energy efficiency of a heterogeneous cluster. Moreover, my evaluation shows that Firmament scales the minimum-cost flow optimisation to clusters of tens of thousands of machines while still making sub-second placement decisions.St John's College Supplementary Emolument Fund
DARP
ENDBOX: Scalable Middlebox Functions Using Client-Side Trusted Execution
Many organisations enhance the performance, security, and functionality of their managed networks by deploying middleboxes centrally as part of their core network. While this simplifies maintenance, it also increases cost because middlebox hardware must scale with the number of clients. A promising alternative is to outsource middlebox functions to the clients themselves, thus leveraging their CPU resources. Such an approach, however, raises security challenges for critical middlebox functions such as firewalls and intrusion detection systems.
We describe EndBox, a system that securely executes middlebox functions on client machines at the network edge. Its design combines a virtual private network (VPN) with middlebox functions that are hardware-protected by a trusted execution environment (TEE), as offered by Intel's Software Guard Extensions (SGX). By maintaining VPN connection endpoints inside SGX enclaves, EndBox ensures that all client traffic, including encrypted communication, is processed by the middlebox. Despite its decentralised model, EndBox's middlebox functions remain maintainable: they are centrally controlled and can be updated efficiently. We demonstrate EndBox with two scenarios involving (i) a large company; and (ii) an Internet service provider that both need to protect their network and connected clients. We evaluate EndBox by comparing it to centralised deployments of common middlebox functions, such as load balancing, intrusion detection, firewalling, and DDoS prevention. We show that EndBox achieves up to 3.8x higher throughput and scales linearly with the number of clients
- …