34 research outputs found

    Tolerating Malicious Device Drivers in Linux

    Get PDF
    URL to paper from conference siteThis paper presents SUD, a system for running existing Linux device drivers as untrusted user-space processes. Even if the device driver is controlled by a malicious adversary, it cannot compromise the rest of the system. One significant challenge of fully isolating a driver is to confine the actions of its hardware device. SUD relies on IOMMU hardware, PCI express bridges, and message-signaled interrupts to confine hardware devices. SUD runs unmodified Linux device drivers, by emulating a Linux kernel environment in user-space. A prototype of SUD runs drivers for Gigabit Ethernet, 802.11 wireless, sound cards, USB host controllers, and USB devices, and it is easy to add a new device class. SUD achieves the same performance as an in-kernel driver on networking benchmarks, and can saturate a Gigabit Ethernet link. SUD incurs a CPU overhead comparable to existing runtime driver isolation techniques, while providing much stronger isolation guarantees for untrusted drivers. Finally, SUD requires minimal changes to the kernel—just two kernel modules comprising 4,000 lines of code—which may at last allow the adoption of these ideas in practice

    Reinventing Scheduling for Multicore Systems

    Get PDF
    High performance on multicore processors requires that schedulers be reinvented. Traditional schedulers focus on keeping execution units busy by assigning each core a thread to run. Schedulers ought to focus, however, on high utilization of on-chip memory, rather than of execution cores, to reduce the impact of expensive DRAM and remote cache accesses. A challenge in achieving good use of on-chip memory is that the memory is split up among the cores in the form of many small caches. This paper argues for a form of scheduling that assigns each object and its operations to a specific core, moving a thread among the cores as it uses different objects

    A Software Approach to Unifying Multicore Caches

    Get PDF
    Multicore chips will have large amounts of fast on-chip cache memory, along with relatively slow DRAM interfaces. The on-chip cache memory, however, will be fragmented and spread over the chip; this distributed arrangement is hard for certain kinds of applications to exploit efficiently, and can lead to needless slow DRAM accesses. First, data accessed from many cores may be duplicated in many caches, reducing the amount of distinct data cached. Second, data in a cache distant from the accessing core may be slow to fetch via the cache coherence protocol. Third, software on each core can only allocate space in the small fraction of total cache memory that is local to that core. A new approach called software cache unification (SCU) addresses these challenges for applications that would be better served by a large shared cache. SCU chooses the on-chip cache in which to cache each item of data. As an application thread reads data items, SCU moves the thread to the core whose on-chip cache contains each item. This allows the thread to read the data quickly if it is already on-chip; if it is not, moving the thread causes the data to be loaded into the chosen on-chip cache. A new file cache for Linux, called MFC, uses SCU to improve performance of file-intensive applications, such as Unix file utilities. An evaluation on a 16-core AMD Opteron machine shows that MFC improves the throughput of file utilities by a factor of 1.6. Experiments with a platform that emulates future machines with less DRAM throughput per core shows that MFC will provide benefit to a growing range of applications.This material is based upon work supported by the National Science Foundation under grant number 0915164

    OpLog: a library for scaling update-heavy data structures

    Get PDF
    Existing techniques (e.g., RCU) can achieve good multi-core scaling for read-mostly data, but for update-heavy data structures only special-purpose techniques exist. This paper presents OpLog, a general-purpose library supporting good scalability for update-heavy data structures. OpLog achieves scalability by logging each update in a low-contention per-core log; it combines logs only when required by a read to the data structure. OpLog achieves generality by logging operations without having to understand them, to ease application to existing data structures. OpLog can further increase performance if the programmer indicates which operations can be combined in the logs. An evaluation shows how to apply OpLog to three update-heavy Linux kernel data structures. Measurements on a 48-core AMD server show that the result significantly improves the performance of the Apache web server and the Exim mail server under certain workloads

    An Analysis of Linux Scalability to Many Cores

    Get PDF
    URL to paper from conference siteThis paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48- core computer. Except for gmake, all applications trigger scalability bottlenecks inside a recent Linux kernel. Using mostly standard parallel programming techniques— this paper introduces one new technique, sloppy counters— these bottlenecks can be removed from the kernel or avoided by changing the applications slightly. Modifying the kernel required in total 3002 lines of code changes. A speculative conclusion from this analysis is that there is no scalability reason to give up on traditional operating system organizations just yet.Quanta Computer (Firm)National Science Foundation (U.S.) (0834415)National Science Foundation (U.S.) (0915164)Microsoft Research (Fellowship)Irwin Mark Jacobs and Joan Klein Jacobs Presidential Fellowshi

    Making information flow explicit in HiStar

    Get PDF
    HiStar is a new operating system designed to minimize the amount of code that must be trusted. HiStar provides strict information flow control, which allows users to specify precise data security policies without unduly limiting the structure of applications. HiStar's security features make it possible to implement a Unix-like environment with acceptable performance almost entirely in an untrusted user-level library. The system has no notion of superuser and no fully trusted code other than the kernel. HiStar's features permit several novel applications, including privacy-preserving, untrusted virus scanners and a dynamic Web server with only a few thousand lines of trusted code.National Science Foundation (U.S.) (Cybertrust Award CNS-0716806)National Science Foundation (U.S.) (Cybertrust/DARPA Grant CNS-0430425

    Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures

    Get PDF
    We introduce "asynchronized concurrency (ASCY),'' a paradigm consisting of four complementary programming patterns. ASCY calls for the design of concurrent search data structures (CSDSs) to resemble that of their sequential counterparts. We argue that ASCY leads to implementations which are portably scalable: they scale across different types of hardware platforms, including single and multi-socket ones, for various classes of workloads, such as read-only and read-write, and according to different performance metrics, including throughput, latency, and energy. We substantiate our thesis through the most exhaustive evaluation of CSDSs to date, involving 6 platforms, 22 state-of-the-art CSDS algorithms, 10 re-engineered state-of-the-art CSDS algorithms following the ASCY patterns, and 2 new CSDS algorithms designed with ASCY in mind. We observe up to 30% improvements in throughput in the re-engineered algorithms, while our new algorithms out-perform the state-of-the-art alternatives

    Time Protection: the Missing OS Abstraction

    Get PDF
    Timing channels enable data leakage that threatens the security of computer systems, from cloud platforms to smartphones and browsers executing untrusted third-party code. Preventing unauthorised information flow is a core duty of the operating system, however, present OSes are unable to prevent timing channels. We argue that OSes must provide time protection in addition to the established memory protection. We examine the requirements of time protection, present a design and its implementation in the seL4 microkernel, and evaluate its efficacy as well as performance overhead on Arm and x86 processors

    Optimizing communication bottlenecks in multiprocessor operating system kernels

    No full text
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.Cataloged from PDF version of thesis.Includes bibliographical references (pages 63-64).One difficulty of programming multicore processors is achieving performance that scales with the number of cores in the system. A common performance optimization is to increase inter-core parallelism. If the application is sufficiently parallelized, developers might hope that performance would scale as core count increases. Unfortunately for some applications, such as operating system kernels, parallelization reveals inter-core communication as a performance bottleneck. When data is updated on one core and read or written on other cores, the cache coherence protocol serializes accesses to the data. The result is that each access to the shared data can take hundreds to thousands of cycles, depending on how many cores are contending for the data. This dissertation focuses on optimizing communication bottlenecks caused by update-heavy workloads, where a data structure is frequently updated but rarely read. Such data structures are commonly used for operating system kernel bookkeeping, such as LRU lists, reverse maps in virtual memory, and file system notification queues. This dissertation identifies bottlenecks in the Linux kernel caused by update-heavy data structures, presents a general approach for optimizing communication in update-heavy data structures, and presents a library called OpLog that embodies this approach and helps developers achieve good scalability for update-heavy data structures. OpLog achieves scalability by logging update operations in per-core logs, and combining the logs only when required by a read to the data structure. Measurements on a 48-core AMD server show that applying OpLog to update-heavy data structures in the Linux kernel significantly improves application performance under certain workloads.by Silas Boyd-Wickizer.Ph. D
    corecore