945 research outputs found

    Hybrid Designs for Caches and Cores.

    Full text link
    Processor power constraints have come to the forefront over the last decade, heralded by the stagnation of clock frequency scaling. High-performance core and cache designs often utilize power-hungry techniques to increase parallelism. Conversely, the most energy-efficient designs opt for a serial execution to avoid unnecessary overheads. While both of these extremes constitute one-size-fits-all approaches, a judicious mix of parallel and serial execution has the potential to achieve the best of both high-performing and energy-efficient designs. This dissertation examines such hybrid designs for cores and caches. Firstly, we introduce a novel, hybrid out-of-order/in-order core microarchitecture. Instructions that are steered towards in-order execution skip register allocation, reordering and dynamic scheduling. At the same time, these instructions can interleave on an instruction-by-instruction basis with instructions that continue to benefit from these conventional out-of-order mechanisms. Secondly, this dissertation revisits a hybrid technique introduced for L1 caches, way-prediction, in the context of last-level caches that are larger, have higher associativity, and experience less locality.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113484/1/sleimanf_1.pd

    Near Data Processing for Efficient and Trusted Systems

    Full text link
    We live in a world which constantly produces data at a rate which only increases with time. Conventional processor architectures fail to process this abundant data in an efficient manner as they expend significant energy in instruction processing and moving data over deep memory hierarchies. Furthermore, to process large amounts of data in a cost effective manner, there is increased demand for remote computation. While cloud service providers have come up with innovative solutions to cater to this increased demand, the security concerns users feel for their data remains a strong impediment to their wide scale adoption. An exciting technique in our repertoire to deal with these challenges is near-data processing. Near-data processing (NDP) is a data-centric paradigm which moves computation to where data resides. This dissertation exploits NDP to both process the data deluge we face efficiently and design low-overhead secure hardware designs. To this end, we first propose Compute Caches, a novel NDP technique. Simple augmentations to underlying SRAM design enable caches to perform commonly used operations. In-place computation in caches not only avoids excessive data movement over memory hierarchy, but also significantly reduces instruction processing energy as independent sub-units inside caches perform computation in parallel. Compute Caches significantly improve the performance and reduce energy expended for a suite of data intensive applications. Second, this dissertation identifies security advantages of NDP. While memory bus side channel has received much attention, a low-overhead hardware design which defends against it remains elusive. We observe that smart memory, memory with compute capability, can dramatically simplify this problem. To exploit this observation, we propose InvisiMem which uses the logic layer in the smart memory to implement cryptographic primitives, which aid in addressing memory bus side channel efficiently. Our solutions obviate the need for expensive constructs like Oblivious RAM (ORAM) and Merkle trees, and have one to two orders of magnitude lower overheads for performance, space, energy, and memory bandwidth, compared to prior solutions. This dissertation also addresses a related vulnerability of page fault side channel in which the Operating System (OS) induces page faults to learn application's address trace and deduces application secrets from it. To tackle it, we propose Sanctuary which obfuscates page fault channel while allowing the OS to manage memory as a resource. To do so, we design a novel construct, Oblivious Page Management (OPAM) which is derived from ORAM but is customized for page management context. We employ near-memory page moves to reduce OPAM overhead and also propose a novel memory partition to reduce OPAM transactions required. For a suite of cloud applications which process sensitive data we show that page fault channel can be tackled at reasonable overheads.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144139/1/shaizeen_1.pd

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    Adaptive Microarchitectural Optimizations to Improve Performance and Security of Multi-Core Architectures

    Get PDF
    With the current technological barriers, microarchitectural optimizations are increasingly important to ensure performance scalability of computing systems. The shift to multi-core architectures increases the demands on the memory system, and amplifies the role of microarchitectural optimizations in performance improvement. In a multi-core system, microarchitectural resources are usually shared, such as the cache, to maximize utilization but sharing can also lead to contention and lower performance. This can be mitigated through partitioning of shared caches.However, microarchitectural optimizations which were assumed to be fundamentally secure for a long time, can be used in side-channel attacks to exploit secrets, as cryptographic keys. Timing-based side-channels exploit predictable timing variations due to the interaction with microarchitectural optimizations during program execution. Going forward, there is a strong need to be able to leverage microarchitectural optimizations for performance without compromising security. This thesis contributes with three adaptive microarchitectural resource management optimizations to improve security and/or\ua0performance\ua0of multi-core architectures\ua0and a systematization-of-knowledge of timing-based side-channel attacks.\ua0We observe that to achieve high-performance cache partitioning in a multi-core system\ua0three requirements need to be met: i) fine-granularity of partitions, ii) locality-aware placement and iii) frequent changes. These requirements lead to\ua0high overheads for current centralized partitioning solutions, especially as the number of cores in the\ua0system increases. To address this problem, we present an adaptive and scalable cache partitioning solution (DELTA) using a distributed and asynchronous allocation algorithm. The\ua0allocations occur through core-to-core challenges, where applications with larger performance benefit will gain cache capacity. The\ua0solution is implementable in hardware, due to low computational complexity, and can scale to large core counts.According to our analysis, better performance can be achieved by coordination of multiple optimizations for different resources, e.g., off-chip bandwidth and cache, but is challenging due to the increased number of possible allocations which need to be evaluated.\ua0Based on these observations, we present a solution (CBP) for coordinated management of the optimizations: cache partitioning, bandwidth partitioning and prefetching.\ua0Efficient allocations, considering the inter-resource interactions and trade-offs, are achieved using local resource managers to limit the solution space.The continuously growing number of\ua0side-channel attacks leveraging\ua0microarchitectural optimizations prompts us to review attacks and defenses to understand the vulnerabilities of different microarchitectural optimizations. We identify the four root causes of timing-based side-channel attacks: determinism, sharing, access violation\ua0and information flow.\ua0Our key insight is that eliminating any of the exploited root causes, in any of the attack steps, is enough to provide protection.\ua0Based on our framework, we present a systematization of the attacks and defenses on a wide range of microarchitectural optimizations, which highlights their key similarities.\ua0Shared caches are an attractive attack surface for side-channel attacks, while defenses need to be efficient since the cache is crucial for performance.\ua0To address this issue, we present an adaptive and scalable cache partitioning solution (SCALE) for protection against cache side-channel attacks. The solution leverages randomness,\ua0and provides quantifiable and information theoretic security guarantees using differential privacy. The solution closes the performance gap to a state-of-the-art non-secure allocation policy for a mix of secure and non-secure applications
    • …
    corecore