433 research outputs found

    Hardware Mechanisms for Efficient Memory System Security

    Full text link
    The security of a computer system hinges on the trustworthiness of the operating system and the hardware, as applications rely on them to protect code and data. As a result, multiple protections for safeguarding the hardware and OS from attacks are being continuously proposed and deployed. These defenses, however, are far from ideal as they only provide partial protection, require complex hardware and software stacks, or incur high overheads. This dissertation presents hardware mechanisms for efficiently providing strong protections against an array of attacks on the memory hardware and the operating system’s code and data. In the first part of this dissertation, we analyze and optimize protections targeted at defending memory hardware from physical attacks. We begin by showing that, contrary to popular belief, current DDR3 and DDR4 memory systems that employ memory scrambling are still susceptible to cold boot attacks (where the DRAM is frozen to give it sufficient retention time and is then re-read by an attacker after reboot to extract sensitive data). We then describe how memory scramblers in modern memory controllers can be transparently replaced by strong stream ciphers without impacting performance. We also demonstrate how the large storage overheads associated with authenticated memory encryption schemes (which enable tamper-proof storage in off-chip memories) can be reduced by leveraging compact integer encodings and error-correcting code (ECC) DRAMs – without forgoing the error detection and correction capabilities of ECC DRAMs. The second part of this dissertation presents Neverland: a low-overhead, hardware-assisted, memory protection scheme that safeguards the operating system from rootkits and kernel-mode malware. Once the system is done booting, Neverland’s hardware takes away the operating system’s ability to overwrite certain configuration registers, as well as portions of its own physical address space that contain kernel code and security-critical data. Furthermore, it prohibits the CPU from fetching privileged code from any memory region lying outside the physical addresses assigned to the OS kernel and drivers. This combination of protections makes it extremely hard for an attacker to tamper with the kernel or introduce new privileged code into the system – even in the presence of software vulnerabilities. Neverland enables operating systems to reduce their attack surface without having to rely on complex integrity monitoring software or hardware. The hardware mechanisms we present in this dissertation provide building blocks for constructing a secure computing base while incurring lower overheads than existing protections.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147604/1/salessaf_1.pd

    Overview of Swallow --- A Scalable 480-core System for Investigating the Performance and Energy Efficiency of Many-core Applications and Operating Systems

    Full text link
    We present Swallow, a scalable many-core architecture, with a current configuration of 480 x 32-bit processors. Swallow is an open-source architecture, designed from the ground up to deliver scalable increases in usable computational power to allow experimentation with many-core applications and the operating systems that support them. Scalability is enabled by the creation of a tile-able system with a low-latency interconnect, featuring an attractive communication-to-computation ratio and the use of a distributed memory configuration. We analyse the energy and computational and communication performances of Swallow. The system provides 240GIPS with each core consuming 71--193mW, dependent on workload. Power consumption per instruction is lower than almost all systems of comparable scale. We also show how the use of a distributed operating system (nOS) allows the easy creation of scalable software to exploit Swallow's potential. Finally, we show two use case studies: modelling neurons and the overlay of shared memory on a distributed memory system.Comment: An open source release of the Swallow system design and code will follow and references to these will be added at a later dat

    Memory Subsystem Optimization Techniques for Modern High-Performance General-Purpose Processors

    Get PDF
    abstract: General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs and GPGPUs alike. This dissertation identifies and mitigates the key performance and energy-efficiency bottlenecks in the memory subsystem of general-purpose processors via novel, practical, microarchitecture and system-architecture solutions. Addressing the important Last Level Cache (LLC) management problem in CMPs, I observe that LLC management decisions made in isolation, as in prior proposals, often lead to sub-optimal system performance. I demonstrate that in order to maximize system performance, it is essential to manage the LLCs while being cognizant of its interaction with the system main memory. I propose ReMAP, which reduces the net memory access cost by evicting cache lines that either have no reuse, or have low memory access cost. ReMAP improves the performance of the CMP system by as much as 13%, and by an average of 6.5%. Rather than the LLC, the L1 data cache has a pronounced impact on GPGPU performance by acting as the bandwidth filter for the rest of the memory subsystem. Prior work has shown that the severely constrained data cache capacity in GPGPUs leads to sub-optimal performance. In this thesis, I propose two novel techniques that address the GPGPU data cache capacity problem. I propose ID-Cache that performs effective cache bypassing and cache line size selection to improve cache capacity utilization. Next, I propose LATTE-CC that considers the GPU’s latency tolerance feature and adaptively compresses the data stored in the data cache, thereby increasing its effective capacity. ID-Cache and LATTE-CC are shown to achieve 71% and 19.2% speedup, respectively, over a wide variety of GPGPU applications. Complementing the aforementioned microarchitecture techniques, I identify the need for system architecture innovations to sustain performance scalability of GPG- PUs in the face of slowing Moore’s Law. I propose a novel GPU architecture called the Multi-Chip-Module GPU (MCM-GPU) that integrates multiple GPU modules to form a single logical GPU. With intelligent memory subsystem optimizations tailored for MCM-GPUs, it can achieve within 7% of the performance of a similar but hypothetical monolithic die GPU. Taking a step further, I present an in-depth study of the energy-efficiency characteristics of future MCM-GPUs. I demonstrate that the inherent non-uniform memory access side-effects form the key energy-efficiency bottleneck in the future. In summary, this thesis offers key insights into the performance and energy-efficiency bottlenecks in CMPs and GPGPUs, which can guide future architects towards developing high-performance and energy-efficient general-purpose processors.Dissertation/ThesisDoctoral Dissertation Computer Science 201
    • …
    corecore