17 research outputs found

    Enabling hardware randomization across the cache hierarchy in Linux-Class processors

    Get PDF
    The most promising secure-cache design approaches use cache-set randomization to index cache contents thus thwarting cache side-channel attacks. Unfortunately, existing randomization proposals cannot be sucessfully applied to processors’ cache hierarchies due to the overhead added when dealing with coherency and virtual memory. In this paper, we solve existing limitations of hardware randomization approaches and propose a cost-effective randomization implementation to the whole cache hierarchy of a Linux-capable RISC-V processor.This work has been supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), and by Generalitat de Catalunya (contracts 2017-SGR-1414 and 2017- SGR-1328). The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total cost eligible. We also thank Red-RISCV for the efforts to promote activities around open hardware. This work has received funding from the EU Horizon2020 programme under grant agreement no. 871467 (SELENE). M. Doblas has been partially supported by the Agency for Management of University and Research Grants (AGAUR) of the Government of Catalonia under Beques de Col·laboració d’estudiants en departaments universitaris per al curs 2019- 2020. V. Kostalabros has been partially supported by the Agency for Management of University and Research Grants (AGAUR) of the Government of Catalonia under Ajuts per a la contractació de personal investigador novell fellowship number 2019FI_ B01274. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramón y Cajal fellowship number RYC- 2016-21104.Peer ReviewedPostprint (published version

    Random and Safe Cache Architecture to Defeat Cache Timing Attacks

    Full text link
    Caches have been exploited to leak secret information due to the different times they take to handle memory accesses. Cache timing attacks include non-speculative cache side and covert channel attacks and cache-based speculative execution attacks. We first present a systematic view of the attack and defense space and show that no existing defense has addressed both speculative and non-speculative cache timing attack families, which we do in this paper. We propose Random and Safe (RaS) cache architectures to decorrelate the cache state changes from memory requests. RaS fills the cache with ``safe'' cache lines that are likely to be used in the future, rather than with demand-fetched, security-sensitive lines. RaS captures a group of safe addresses during runtime and fetches addresses randomly displaced from these addresses. Our proposed RaS architecture is flexible to allow security-performance trade-offs. We show different designs of RaS architectures that can defeat cache side-channel attacks and cache-based speculative execution attacks. The RaS variant against cache-based speculative execution attacks has 4.2% average performance overhead and other RaS variants against both attack families have 7.9% to 45.2% average overhead. For some benchmarks, RaS defenses improve the performance while providing security

    New Cache Attacks and Defenses

    Get PDF
    The sharing of last-level cache (LLC) among different physical cores makes cache vulnerable to side channel attacks. An attacker can get private information about co-running applications (victims) by monitoring their accesses. Cache side channel attacks can be mitigated by partitioning cache between the victim and attacker. However, previous partition works make the incorrect assumption that only the victim's cache misses are visible to attackers. In this work, we provide the key insight that both cache hits and cache misses from the victim are vulnerable. For a cache hit, although it does not affect the existence state, it can still change the replacement state and coherence state, which can also leak information to attackers. Based on this, we propose Invisible-Victim cache (IVcache), a new cache design that can mitigate both traditional LLC attacks and the new variants. IVcache classifies all processes as protected and unprotected. For accesses from protected processes, IVcache handles state changes in a slightly different way to make those accesses absolutely invisible to any other processes. We evaluate the defense effectiveness and performance of IVcache in the gem5 simulator. We show that IVcache can defend against real-world attacks, and that it introduces negligible performance effect to both protected and unprotected processes

    A Case for Reversible Coherence Protocol

    Full text link
    We propose the first Reversible Coherence Protocol (RCP), a new protocol designed from ground up that enables invisible speculative load. RCP takes a bold approach by including the speculative loads and merge/purge operation in the interface between processor and cache coherence, and allowing them to participate in the coherence protocol. It means, speculative load, ordinary load/store, and merge/purge can all affect the state of a given cache line. RCP is the first coherence protocol that enables the commit and squash of the speculative load among distributed cache components in a general memory hierarchy. RCP incurs an average slowdown of (3.0%,8.3%,7.4%) on (SPEC2006,SPEC2017,PARSEC), which is lower compared to (26.5%,12%,18.3%) in InvisiSpec and (3.2%,9.4%,24.2%) in CleanupSpec. The coherence traffic overhead is on average 46%, compared to 40% and 27% of InvisiSpec and CleanupSpec, respectively. Even with higher traffic overhead (~46%), the performance overhead of RCP is lower than InvisiSpec and comparable to CleanupSpec. It reveals a key advantage of RCP: the coherence actions triggered by the merge and purge operations are not in the critical path of the execution and can be performed in the cache hierarchy concurrently with processor executio

    An Associativity Threshold Phenomenon in Set-Associative Caches

    Full text link
    In an α\alpha-way set-associative cache, the cache is partitioned into disjoint sets of size α\alpha, and each item can only be cached in one set, typically selected via a hash function. Set-associative caches are widely used and have many benefits, e.g., in terms of latency or concurrency, over fully associative caches, but they often incur more cache misses. As the set size α\alpha decreases, the benefits increase, but the paging costs worsen. In this paper we characterize the performance of an α\alpha-way set-associative LRU cache of total size kk, as a function of α=α(k)\alpha = \alpha(k). We prove the following, assuming that sets are selected using a fully random hash function: - For α=ω(logk)\alpha = \omega(\log k), the paging cost of an α\alpha-way set-associative LRU cache is within additive O(1)O(1) of that a fully-associative LRU cache of size (1o(1))k(1-o(1))k, with probability 11/poly(k)1 - 1/\operatorname{poly}(k), for all request sequences of length poly(k)\operatorname{poly}(k). - For α=o(logk)\alpha = o(\log k), and for all c=O(1)c = O(1) and r=O(1)r = O(1), the paging cost of an α\alpha-way set-associative LRU cache is not within a factor cc of that a fully-associative LRU cache of size k/rk/r, for some request sequence of length O(k1.01)O(k^{1.01}). - For α=ω(logk)\alpha = \omega(\log k), if the hash function can be occasionally changed, the paging cost of an α\alpha-way set-associative LRU cache is within a factor 1+o(1)1 + o(1) of that a fully-associative LRU cache of size (1o(1))k(1-o(1))k, with probability 11/poly(k)1 - 1/\operatorname{poly}(k), for request sequences of arbitrary (e.g., super-polynomial) length. Some of our results generalize to other paging algorithms besides LRU, such as least-frequently used (LFU)

    Adaptive Microarchitectural Optimizations to Improve Performance and Security of Multi-Core Architectures

    Get PDF
    With the current technological barriers, microarchitectural optimizations are increasingly important to ensure performance scalability of computing systems. The shift to multi-core architectures increases the demands on the memory system, and amplifies the role of microarchitectural optimizations in performance improvement. In a multi-core system, microarchitectural resources are usually shared, such as the cache, to maximize utilization but sharing can also lead to contention and lower performance. This can be mitigated through partitioning of shared caches.However, microarchitectural optimizations which were assumed to be fundamentally secure for a long time, can be used in side-channel attacks to exploit secrets, as cryptographic keys. Timing-based side-channels exploit predictable timing variations due to the interaction with microarchitectural optimizations during program execution. Going forward, there is a strong need to be able to leverage microarchitectural optimizations for performance without compromising security. This thesis contributes with three adaptive microarchitectural resource management optimizations to improve security and/or\ua0performance\ua0of multi-core architectures\ua0and a systematization-of-knowledge of timing-based side-channel attacks.\ua0We observe that to achieve high-performance cache partitioning in a multi-core system\ua0three requirements need to be met: i) fine-granularity of partitions, ii) locality-aware placement and iii) frequent changes. These requirements lead to\ua0high overheads for current centralized partitioning solutions, especially as the number of cores in the\ua0system increases. To address this problem, we present an adaptive and scalable cache partitioning solution (DELTA) using a distributed and asynchronous allocation algorithm. The\ua0allocations occur through core-to-core challenges, where applications with larger performance benefit will gain cache capacity. The\ua0solution is implementable in hardware, due to low computational complexity, and can scale to large core counts.According to our analysis, better performance can be achieved by coordination of multiple optimizations for different resources, e.g., off-chip bandwidth and cache, but is challenging due to the increased number of possible allocations which need to be evaluated.\ua0Based on these observations, we present a solution (CBP) for coordinated management of the optimizations: cache partitioning, bandwidth partitioning and prefetching.\ua0Efficient allocations, considering the inter-resource interactions and trade-offs, are achieved using local resource managers to limit the solution space.The continuously growing number of\ua0side-channel attacks leveraging\ua0microarchitectural optimizations prompts us to review attacks and defenses to understand the vulnerabilities of different microarchitectural optimizations. We identify the four root causes of timing-based side-channel attacks: determinism, sharing, access violation\ua0and information flow.\ua0Our key insight is that eliminating any of the exploited root causes, in any of the attack steps, is enough to provide protection.\ua0Based on our framework, we present a systematization of the attacks and defenses on a wide range of microarchitectural optimizations, which highlights their key similarities.\ua0Shared caches are an attractive attack surface for side-channel attacks, while defenses need to be efficient since the cache is crucial for performance.\ua0To address this issue, we present an adaptive and scalable cache partitioning solution (SCALE) for protection against cache side-channel attacks. The solution leverages randomness,\ua0and provides quantifiable and information theoretic security guarantees using differential privacy. The solution closes the performance gap to a state-of-the-art non-secure allocation policy for a mix of secure and non-secure applications

    TPPD: Targeted Pseudo Partitioning based Defence for cross-core covert channel attacks

    Get PDF
    Contemporary computing employs cache hierarchy to fill the speed gap between processors and main memories. In order to optimise system performance, Last Level Caches (LLC) are shared among all the cores. Cache sharing has made them an attractive surface for cross-core timing channel attacks. In these attacks, an attacker running on another core can exploit the access timing of the victim process to infiltrate the secret information. One such attack is called a cross-core Covert Channel Attack (CCA). Timely detection and then prevention of cross-core CCA is critical for maintaining the integrity and security of users, especially in a shared computing environment. In this work, we have proposed an efficient cross-core CCA mitigation technique. We propose a way-wise cache partitioning on targeted sets, only for the processes suspected to be attackers. In this way, the performance impact on the entire LLC is minimised, and benign applications can utilise the LLC to its full capacity. We have used a cycle-accurate simulator (gem5) to analyse the performance of the proposed method and its security effectiveness. It has been successful in abolishing the cross-core covert timing channel attack with no significant performance impact on benign applications. It causes 23% less cache misses in comparison to existing partitioning based solutions while requiring ≈0.26% storage overhead
    corecore