231 research outputs found

    Portable Userspace Virtual Filesystem Switch

    Get PDF
    Multiple different filesystems — including disk-based, network, distributed, abstract — arean integral part of every operating system. They are usually written as kernel modules and abstracted to the user via a virtual filesystem switch. In this paper we analyse the feasibility of reimplementing the virtual filesystem switch as a userspace daemon and applicability of this approach in real-life usage. Such reimplementation will require a way to virtualise processes behaviour related to filesystem operations. The problem is non-trivial, as we assume limited capabilities of the VFS switch implemented in userspace. We present a layered architecture comprising of a monitoring process, the VFS abstraction and real filesystem implementations. All working in userspace. Then, we evaluate this solution in four areas: portability, feasibility, usability and performance. Our results demonstrate possible gains in using the userspace-based approach with monolithic kernels, but also underline problems that are encountered in this approach

    A Fully Userspace Remote Storage Access Stack

    Get PDF
    As computer networking has evolved and the available throughput has increased, the efficiency of the network software stack has become increasingly important. This is because the latency introduced by software has gone from insignificant, compared to historically poor network performance, to the largest component of latency for a modern local-area network. Currently, the vast majority of code that accesses the hardware is part of the kernel, because the kernel is responsible for ensuring that user applications do not interfere with each other when accessing the hardware. Remote Direct Memory Access~(RDMA) provides a solution for applications to perform direct data transfers over the network without requiring context switches into the kernel, but relies instead on specialized hardware interfaces to handle the virtual address mappings and transport protocols. This more intelligent hardware allows for direct control from the userspace application, eliminating the cost of context switches into the kernel. This in turn reduces the overall latency of message transfers. Just like networking, storage is currently undergoing a similar evolution. For most of the recent history of computing, the most common durable storage mechanism has been mechanical hard disk drives, which can only be accessed at block level and have high latency compared to the software drivers used to access the data. However, the introduction of solid state disks~(SSDs) based on Flash significantly decreased the latency, as there are no mechanical parts that need to move to access the data. Upcoming non-volatile memory solutions reduce this latency even further, and even allow byte-level access to the storage medium. Thus, just like with networking, software drivers become the bottleneck and we look for solutions to bypass the kernel to improve the efficiency of direct userspace access to storage. This thesis offers two contributions as part of a solution to these problems. The first part introduces urdma, a software RDMA driver which leverages the Data Plane Development Kit (DPDK) to perform network data transfers in userspace without specialized RDMA interface hardware. The second part examines remote locking protocols, which are required for synchronization in distributed storage systems. We define an RDMA locking mechanism referred to as Verbs Offload Locking Technology (VOLT), which allows acquisition of a remote lock object without any CPU usage by the target node. This offloading allows VOLT to be used with disaggregated memory servers that have limited onboard CPU resources, while also lowering the application overhead for remote locking. Finally, we define a bytecode framework using enhanced Berkeley Packet Filter (eBPF) bytecode for extending the capabilities of an RDMA-capable network interface card (NIC) with new operations, and show how this can be used to implement our remote locking operation

    Improved Architectures for Secure Intra-process Isolation

    Get PDF
    Intra-process memory isolation can improve security by enforcing least-privilege at a finer granularity than traditional operating system controls without the context-switch overhead associated with inter-process communication. Because the process has traditionally been a fundamental security boundary, assigning different levels of trust to components within a process is a fundamental change in secure systems design. However, so far there has been little research on the challenges of securely implementing intra-process isolation on top of existing operating system abstractions. We find that frequently-used assumptions in secure system design do not precisely hold under realistic conditions, and that these discrepancies lead to exploitable vulnerabilities. We evaluate two recently-proposed memory isolation systems and show that both are vulnerable to the same generic attacks that break their security model. We then extend a subset of these attacks by applying them to a fully-precise model of control-flow integrity, demonstrating a data-only attack that bypasses both static and dynamic control-flow integrity enforcement by overwriting executable code in-memory even under typical w^x assumptions. From these two results, we propose a set of kernel modifications called Xlock that systemically addresses weaknesses in memory permissions enforcement on Linux, bringing them into line with w^x assumptions. Finally, we present modifications to intra-process isolation systems that preserve efficient userspace component transitions while drastically reducing risk of accidental kernel mismanagement by modeling intra-process components as separate processes from the kernel\u27s perspective. Taken together, these mitigations represent a more robust architecture for efficient and secure intra-process isolation

    Networking Subsystem Configuration Interface

    Get PDF
    Cílem diplomové práce je návrh síťové konfigurační knihovny s důrazem kladeným na přenositelnost mezi operačními systémy na bázi Linuxu a BSD a rozšiřitelnosti podpory knihovny. V druhé kapitole práce zkoumá dostupné konfigurační rozhraní obou operačních systémů. Detailně pak rozebírá vlastnosti rozhraní Netlink socketů, které je primárním konfiguračním rozhraním pro síťové prvky na Linuxu, a systémové volání ioctl, které má na Linuxu menší schopnosti, ale zato je primárně používané na BSD a jiných UNIX systémech. Jsou též zkoumané rozhraní pro konfiguraci rozdílných firewallů. V třetí kapitole je práce zameřená na konkrétní typy síťových zařízení, specifika jejich konfigurace a jejich návaznost na rozhraní jádra popsané v druhé kapitole. V čtvrté kapitole jsou formulovány požadavky na konfigurační knihovnu: jednoduchá rozšiřitelnost, přenositelnost na různé operační systémy, podpora sledování změn a událostí a rozšiřitelnost o různé typy uživatelských rozhraní. Na základě výzkumu z předcházejících dvou kapitol je přednesen návrh knihovny. Návrh definuje konfigurační rozhraní jako hierarchii abstraktních tříd, oddělených od implementace. To umožnuje mít současně několik implementací stejného konfiguračního rozhraní i v rámci jednoho operačního systému. Jako vstupní rozhraní knihovny je definovaná třída LibNCFG, která má na starosti tyto konfigurační objekty vytvořit namísto uživatele. Tímto je dosažená jednoduchá rozšiřitelnost knihovny o nové rozhraní operačních systémů i o podporu konfigurace nových síťových prvků. Podpora pro nové uživatelské rozhraní se dá implementovat jako nová služba, která zabaluje rozhraní knihovny a poskytuje jiná rozhraní. Pro podporu sledování změn poskytuje třída LibNCFG metody pro registraci zpětných volání pro definované události. Ve čtvrté kapitole práce detailně popisuje rozhraní třídy LibNCFG, modulu Common a tříd NetDevice, EthDevice a BondDevice, které definují konfigurační rozhraní příslušných typů síťových zařízení. Pro tyto třídy jsou implementované konkrétní třídy NetlinkNetDevice, NetlinkEthDevice a sysfsBondDevice a popsané jejich implementační detaily. V páté kapitole je popsaná ukázková aplikace, která byla implementovaná pro účely předvedení jednoduchosti použití konfigurační knihovny. Nakonec jsou v závěru shrnuté výsledky práce a je vedena diskuze o možných vylepšeních a o pokračování projektu.The goal of this thesis is to design a network configuration library with regards to operating system portability and extendability of supported features. To achieve this portable design the thesis explores and analyses the currently available network configuration options of Linux and BSD based operating systems and commonly used network devices. It provides and indepth description of Netlink sockets on Linux as the primary network configuration interface, and ioctl system calls that are used on BSD systems. The gathered information is used to create a portable and extendable library design that separates the configuration interface from its implementation into a hierarchy of abstract classes. Furthermore the class LibNCFG is defined as the entry point of the library which handles object creation and destruction instead of the user. This design provides a high level of extendability and ease of use at the same time. The thesis also describes the chosen parts of the library that were implemented so far. The thesis also describes a simple application that was created to showcase the ease of use of the created library. In the end the library summarizes achieved results and discusses possible improvements and continuation of the project.

    Implementation-Oblivious Transparent Checkpoint-Restart for MPI

    Full text link
    This work presents experience with traditional use cases of checkpointing on a novel platform. A single codebase (MANA) transparently checkpoints production workloads for major available MPI implementations: "develop once, run everywhere". The new platform enables application developers to compile their application against any of the available standards-compliant MPI implementations, and test each MPI implementation according to performance or other features.Comment: 17 pages, 4 figure
    corecore