535 research outputs found

    Systematic Reverse Engineering of Cache Slice Selection in Intel Processors

    Get PDF
    Dividing last level caches into slices is a popular method to prevent memory accesses from becoming a bottleneck on modern multicore processors. In order to assess and understand the benefits of cache slicing in detail, a precise knowledge of implementation details such as the slice selection algorithm are of high importance. However, slice selection methods are mostly unstudied, and processor manufacturers choose not to publish their designs, nor their design rationale. In this paper, we present a tool that allows to recover the slice selection algorithm for Intel processors. The tool uses cache access information to derive equations that allow the reconstruction of the applied slice selection algorithm. Thereby, the tool provides essential information for performing last level cache attacks and enables further exploration of the behavior of modern caches. The tool is successfully applied to a range of Intel CPUs with different slices and architectures. Results show that slice selection algorithms have become more complex over time by involving an increasing number of bits of the physical address. We also demonstrate that among the most recent processors, the slice selection algorithm depends on the number of CPU cores rather than the processor model

    Optimizing Coherence Traffic in Manycore Processors Using Closed-Form Caching/Home Agent Mappings

    Get PDF
    [Abstract] Manycore processors feature a high number of general-purpose cores designed to work in a multithreaded fashion. Recent manycore processors are kept coherent using scalable distributed directories. A paramount example is the Intel Mesh interconnect, which consists of a network-on-chip interconnecting “tiles”, each of which contains computation cores, local caches, and coherence masters. The distributed coherence subsystem must be queried for every out-of-tile access, imposing an overhead on memory latency. This paper studies the physical layout of an Intel Knights Landing processor, with a particular focus on the coherence subsystem, and uncovers the pseudo-random mapping function of physical memory blocks across the pieces of the distributed directory. Leveraging this knowledge, candidate optimizations to improve memory latency through the minimization of coherence traffic are studied. Although these optimizations do improve memory throughput, ultimately this does not translate into performance gains due to inherent overheads stemming from the computational complexity of the mapping functions.Ministerio de Educación; FPU16/00816U.S. National Science Foundation; CCF-1750399Xunta de Galicia and FEDER; ED431G 2019/01Ministerio de Ciencia e Innovación; PID2019-104184RB-I0

    Cross-core Microarchitectural Attacks and Countermeasures

    Get PDF
    In the last decade, multi-threaded systems and resource sharing have brought a number of technologies that facilitate our daily tasks in a way we never imagined. Among others, cloud computing has emerged to offer us powerful computational resources without having to physically acquire and install them, while smartphones have almost acquired the same importance desktop computers had a decade ago. This has only been possible thanks to the ever evolving performance optimization improvements made to modern microarchitectures that efficiently manage concurrent usage of hardware resources. One of the aforementioned optimizations is the usage of shared Last Level Caches (LLCs) to balance different CPU core loads and to maintain coherency between shared memory blocks utilized by different cores. The latter for instance has enabled concurrent execution of several processes in low RAM devices such as smartphones. Although efficient hardware resource sharing has become the de-facto model for several modern technologies, it also poses a major concern with respect to security. Some of the concurrently executed co-resident processes might in fact be malicious and try to take advantage of hardware proximity. New technologies usually claim to be secure by implementing sandboxing techniques and executing processes in isolated software environments, called Virtual Machines (VMs). However, the design of these isolated environments aims at preventing pure software- based attacks and usually does not consider hardware leakages. In fact, the malicious utilization of hardware resources as covert channels might have severe consequences to the privacy of the customers. Our work demonstrates that malicious customers of such technologies can utilize the LLC as the covert channel to obtain sensitive information from a co-resident victim. We show that the LLC is an attractive resource to be targeted by attackers, as it offers high resolution and, unlike previous microarchitectural attacks, does not require core-colocation. Particularly concerning are the cases in which cryptography is compromised, as it is the main component of every security solution. In this sense, the presented work does not only introduce three attack variants that can be applicable in different scenarios, but also demonstrates the ability to recover cryptographic keys (e.g. AES and RSA) and TLS session messages across VMs, bypassing sandboxing techniques. Finally, two countermeasures to prevent microarchitectural attacks in general and LLC attacks in particular from retrieving fine- grain information are presented. Unlike previously proposed countermeasures, ours do not add permanent overheads in the system but can be utilized as preemptive defenses. The first identifies leakages in cryptographic software that can potentially lead to key extraction, and thus, can be utilized by cryptographic code designers to ensure the sanity of their libraries before deployment. The second detects microarchitectural attacks embedded into innocent-looking binaries, preventing them from being posted in official application repositories that usually have the full trust of the customer

    Cache Timing Attacks on Public Key Encryption

    Get PDF
    The rise of cloud computing has made it a lot easier for attackers to be able to run code on the same processors as their target. This has made many attacks more viable. This thesis discusses a cache timing attack targeting the LibTomMath library. LibTom-Math is a mathematical library for computations using large integers. The library is used in some cryptographic libraries such the commercial solution WolfCrypt. The attack mainly focuses on the modular exponentiation function of LibTom-Math which is a major part of RSA implementations. The aim of the attack is to use cache timing in order to extract the long term private key used by the server for encrypting communications. Recovering the private key, gives the attacker access to past and future communications secured using this key, which usually has a lifespan of at least one year. The attack only requires that it shares a processor with the victim and works even if the attack process and the victim process are running on different Virtual Machines. The thesis includes a description of the RSA cipher as well as the various optimizations that are used in a lot of cryptographic libraries. Next, it describes how to use cache timing to exploit some of those optimizations in order to gain information about the secret exponent based on the memory access patterns of the target code. Finally, it discusses the limitations of the attack as well as how cloud service providers, cryptographic library developers, as well as processor manufacturers, may be able to mitigate this class of attacks

    Curious case of Rowhammer: Flipping Secret Exponent Bits using Timing Analysis

    Get PDF
    Rowhammer attacks have exposed a serious vulnerability in modern DRAM chips to induce bit flips in data which is stored in memory. In this paper, we develop a methodology to combine timing analysis to perform the hammering in a controlled manner to create bit flips in cryptographic keys which are stored in memory. The attack would require only user level privilege for Linux kernel versions before 4.0 and is unaware of the memory location of the key. An intelligent combination of timing Prime + Probe attack and row-buffer collision is shown to induce bit flip faults in a 1024 bit RSA key on modern processors using realistic number of hammering attempts. This demonstrates the feasibility of fault analysis of ciphers using purely software means on commercial x86 architectures, which to the best of our knowledge has not been reported earlier. The attack is also relevant for the newest Linux kernel in a Cross-VM environment where the VMs having root privilege are not denied to access the pagemap

    Mapping the Intel Last-Level Cache

    Get PDF
    Modern Intel processors use an undisclosed hash function to map memory lines into last-level cache slices. In this work we develop a technique for reverse-engineering the hash function. We apply the technique to a 6-core Intel processor and demonstrate that knowledge of this hash function can facilitate cache-based side channel attacks, reducing the amount of work required for profiling the cache by three orders of magnitude. We also show how using the hash function we can double the number of colours used for page-colouring techniques
    corecore