Search CORE

4 research outputs found

GMEM: Generalized Memory Management for Peripheral Devices

Author: Cox Alan L.
Rixner Scott
Zhu Weixi
Publication venue
Publication date: 19/10/2023
Field of study

This paper presents GMEM, generalized memory management, for peripheral devices. GMEM provides OS support for centralized memory management of both CPU and devices. GMEM provides a high-level interface that decouples MMU-specific functions. Device drivers can thus attach themselves to a process's address space and let the OS take charge of their memory management. This eliminates the need for device drivers to "reinvent the wheel" and allows them to benefit from general memory optimizations integrated by GMEM. Furthermore, GMEM internally coordinates all attached devices within each virtual address space. This drastically improves user-level programmability, since programmers can use a single address space within their program, even when operating across the CPU and multiple devices. A case study on device drivers demonstrates these benefits. A GMEM-based IOMMU driver eliminates around seven hundred lines of code and obtains 54% higher network receive throughput utilizing 32% less CPU compared to the state-of-the-art. In addition, the GMEM-based driver of a simulated GPU takes less than 70 lines of code, excluding its MMU functions.Comment: Finished before Weixi left Rice and submitted to ASPLOS'2

arXiv.org e-Print Archive

Understanding PCIe performance for end host networking

Author: Dixon Colin
Kalia Anuj
Lee Ki Suh
Lim Hyeontaek
McVoy Larry
Moll L.
Peleg Omer
Pratt Ian
Radhakrishnan Sivasankar
Rizzo Luigi
Zazo Jose Fernando
Publication venue: SIGCOMM '18 Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication
Publication date: 25/08/2018
Field of study

In recent years, spurred on by the development and availability of programmable NICs, end hosts have increasingly become the enforcement point for core network functions such as load balancing, congestion control, and application specific network offloads. However, implementing custom designs on programmable NICs is not easy: many potential bottlenecks can impact performance. This paper focuses on the performance implication of PCIe, the de-facto I/O interconnect in contemporary servers, when interacting with the host architecture and device drivers. We present a theoretical model for PCIe and pcie-bench, an open-source suite, that allows developers to gain an accurate and deep understanding of the PCIe substrate. Using pcie-bench, we characterize the PCIe subsystem in modern servers. We highlight surprising differences in PCIe implementations, evaluate the undesirable impact of PCIe features such as IOMMUs, and show the practical limits for common network cards operating at 40Gb/s and beyond. Furthermore, through pcie-bench we gained insights which guided software and future hardware architectures for both commercial and research oriented network cards and DMA engines

Crossref

Apollo (Cambridge)

Recommended from our members

Exploitation from Malicious PCI Express Peripherals

Author: Rothwell Colin Lewis
Publication venue: University of Cambridge
Publication date: 23/03/2018
Field of study

The thesis of this dissertation is that, despite widespread belief in the security community, systems are still vulnerable to attacks from malicious peripherals delivered over the PCI Express (PCIe) protocol. Malicious peripherals can be plugged directly into internal PCIe slots, or connected via an external Thunderbolt connection. To prove this thesis, we designed and built a new PCIe attack platform. We discovered that a simple platform was insufficient to carry out complex attacks, so created the first PCIe attack platform that runs a full, conventional OS. To allows us to conduct attacks against higher-level OS functionality built on PCIe, we made the attack platform emulate in detail the behaviour of an Intel 82574L Network Interface Controller (NIC), by using a device model extracted from the QEMU emulator. We discovered a number of vulnerabilities in the PCIe protocol itself, and with the way that the defence mechanisms it provides are used by modern OSs. The principal defence mechanism provided is the Input/Output Memory Management Unit (IOMMU). The remaps the address space used by peripherals in 4KiB chunks, and can prevent access to areas of address space that a peripheral should not be able to access. We found that, contrary to belief in the security community, the IOMMUs in modern systems were not designed to protect against attacks from malicious peripherals, but to allow virtual machines direct access to real hardware. We discovered that use of the IOMMU is patchy even in modern operating systems. Windows effectively does not use the IOMMU at all; macOS opens windows that are shared by all devices; Linux and FreeBSD map windows into host memory separately for each device, but only if poorly documented boot flags are used. These OSs make no effort to ensure that only data that should be visible to the devices is in the mapped windows. We created novel attacks that subverted control flow and read private data against systems running macOS, Linux and FreeBSD with the highest level of relevant protection enabled. These represent the first use of the relevant exploits in each case. In the final part of this thesis, we evaluate the suitability of a number of proposed general purpose and specific mitigations against DMA attacks, and make a number of recommendations about future directions in IOMMU software and hardware.EPSRC and ARM iCASE Awar

Apollo (Cambridge)

Recommended from our members

Capability Memory Protection for Embedded Systems

Author: Xia Hongyan
Publication venue: University of Cambridge
Publication date: 04/02/2020
Field of study

This dissertation explores the use of capability security hardware and software in real-time and latency-sensitive embedded systems, to address existing memory safety and task isolation problems as well as providing new means to design a secure and scalable real-time system. In addition, this dissertation looks into how practical and high-performance temporal memory safety can be achieved under a capability architecture. State-of-the-art memory protection schemes for embedded systems typically present limited and inflexible solutions to memory protection and isolation, and fail to scale as embedded devices become more capable and ubiquitous. I investigate whether a capability architecture is able to provide new angles to address memory safety issues in an embedded scenario. Previous CHERI capability research focuses on 64-bit architectures in UNIX operating systems, which does not translate to typical 32-bit embedded processors with low-latency and real-time requirements. I propose and implement the CHERI CC-64 encoding and the CHERI-64 coprocessor to construct a feasible capability-enabled 32-bit CPU. In addition, I implement a real-time kernel for embedded systems atop CHERI-64. On this hardware and software platform, I focus on exploring scalable task isolation and fine-grained memory protection enabled by capabilities in a single flat physical address space, which are otherwise difficult or impossible to achieve via state-of-the-art approaches. Later, I present the evaluation of the hardware implementation and the software run-time overhead and real-time performance. Even with capability support, CHERI-64 as well as other CHERI processors still expose major attack surfaces through temporal vulnerabilities like use-after-free. A naive approach that sweeps memory to invalidate stale capabilities is inefficient and incurs significant cycle overhead and DRAM traffic. To make sweeping revocation feasible, I introduce new architectural mechanisms and micro-architectural optimisations to substantially reduce the cost of memory sweeping and capability revocation. Another factor of the cost is the frequency of memory sweeping. I explore tradeoffs of memory allocator designs that use quarantine buffers and shadow space tags to prevent frequent unnecessary sweeping. The evaluation shows that the optimisations and new allocator designs reduce the cost of capability sweeping revocation by orders of magnitude, making it already practical for most applications to adopt temporal safety under CHERI.CSC Cambridge Scholarshi

Apollo (Cambridge)