17 research outputs found
Enabling hardware randomization across the cache hierarchy in Linux-Class processors
The most promising secure-cache design approaches use cache-set randomization to index cache contents thus thwarting cache side-channel attacks. Unfortunately, existing randomization proposals cannot be sucessfully applied to processors’ cache hierarchies due to the overhead added when dealing with coherency and virtual memory. In this paper, we solve existing limitations of hardware randomization approaches and propose a cost-effective randomization implementation to the whole cache hierarchy of a Linux-capable RISC-V processor.This work has been supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), and by Generalitat de Catalunya (contracts 2017-SGR-1414 and 2017- SGR-1328). The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total cost eligible. We also thank Red-RISCV for the efforts to promote activities around open hardware. This work has received funding from the EU Horizon2020 programme under grant agreement no. 871467 (SELENE). M. Doblas has been partially supported by the Agency for Management of University and Research Grants (AGAUR) of the Government of Catalonia under Beques de Col·laboració d’estudiants en departaments universitaris per al curs 2019- 2020. V. Kostalabros has been partially supported by the Agency for Management of University and Research Grants (AGAUR) of the Government of Catalonia under Ajuts per a la contractació de personal investigador novell fellowship number 2019FI_ B01274. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramón y Cajal fellowship number RYC- 2016-21104.Peer ReviewedPostprint (published version
Random and Safe Cache Architecture to Defeat Cache Timing Attacks
Caches have been exploited to leak secret information due to the different
times they take to handle memory accesses. Cache timing attacks include
non-speculative cache side and covert channel attacks and cache-based
speculative execution attacks. We first present a systematic view of the attack
and defense space and show that no existing defense has addressed both
speculative and non-speculative cache timing attack families, which we do in
this paper. We propose Random and Safe (RaS) cache architectures to decorrelate
the cache state changes from memory requests. RaS fills the cache with ``safe''
cache lines that are likely to be used in the future, rather than with
demand-fetched, security-sensitive lines. RaS captures a group of safe
addresses during runtime and fetches addresses randomly displaced from these
addresses. Our proposed RaS architecture is flexible to allow
security-performance trade-offs. We show different designs of RaS architectures
that can defeat cache side-channel attacks and cache-based speculative
execution attacks. The RaS variant against cache-based speculative execution
attacks has 4.2% average performance overhead and other RaS variants against
both attack families have 7.9% to 45.2% average overhead. For some benchmarks,
RaS defenses improve the performance while providing security
New Cache Attacks and Defenses
The sharing of last-level cache (LLC) among different physical cores makes cache vulnerable to side channel attacks. An attacker can get private information about co-running applications (victims) by monitoring their accesses. Cache side channel attacks can be mitigated by partitioning cache between the victim and attacker. However, previous partition works make the incorrect assumption that only the victim's cache misses are visible to attackers.
In this work, we provide the key insight that both cache hits and cache misses from the victim are vulnerable. For a cache hit,
although it does not affect the existence state, it can still change the replacement state and coherence state, which can also leak information to attackers. Based on this, we propose Invisible-Victim cache (IVcache), a new cache design that can mitigate both traditional LLC attacks and the new variants. IVcache classifies all processes as protected and unprotected. For accesses from protected processes, IVcache handles state changes in a slightly different way to make those accesses absolutely invisible to any other processes. We evaluate the defense effectiveness and performance of IVcache in the gem5 simulator. We show that IVcache can defend against real-world attacks, and that it introduces negligible performance effect to both protected and unprotected processes
A Case for Reversible Coherence Protocol
We propose the first Reversible Coherence Protocol (RCP), a new protocol
designed from ground up that enables invisible speculative load. RCP takes a
bold approach by including the speculative loads and merge/purge operation in
the interface between processor and cache coherence, and allowing them to
participate in the coherence protocol. It means, speculative load, ordinary
load/store, and merge/purge can all affect the state of a given cache line. RCP
is the first coherence protocol that enables the commit and squash of the
speculative load among distributed cache components in a general memory
hierarchy. RCP incurs an average slowdown of (3.0%,8.3%,7.4%) on
(SPEC2006,SPEC2017,PARSEC), which is lower compared to (26.5%,12%,18.3%) in
InvisiSpec and (3.2%,9.4%,24.2%) in CleanupSpec. The coherence traffic overhead
is on average 46%, compared to 40% and 27% of InvisiSpec and CleanupSpec,
respectively. Even with higher traffic overhead (~46%), the performance
overhead of RCP is lower than InvisiSpec and comparable to CleanupSpec. It
reveals a key advantage of RCP: the coherence actions triggered by the merge
and purge operations are not in the critical path of the execution and can be
performed in the cache hierarchy concurrently with processor executio
An Associativity Threshold Phenomenon in Set-Associative Caches
In an -way set-associative cache, the cache is partitioned into
disjoint sets of size , and each item can only be cached in one set,
typically selected via a hash function. Set-associative caches are widely used
and have many benefits, e.g., in terms of latency or concurrency, over fully
associative caches, but they often incur more cache misses. As the set size
decreases, the benefits increase, but the paging costs worsen.
In this paper we characterize the performance of an -way
set-associative LRU cache of total size , as a function of . We prove the following, assuming that sets are selected using a
fully random hash function:
- For , the paging cost of an -way
set-associative LRU cache is within additive of that a fully-associative
LRU cache of size , with probability ,
for all request sequences of length .
- For , and for all and , the paging
cost of an -way set-associative LRU cache is not within a factor of
that a fully-associative LRU cache of size , for some request sequence of
length .
- For , if the hash function can be occasionally
changed, the paging cost of an -way set-associative LRU cache is within
a factor of that a fully-associative LRU cache of size ,
with probability , for request sequences of
arbitrary (e.g., super-polynomial) length.
Some of our results generalize to other paging algorithms besides LRU, such
as least-frequently used (LFU)
Adaptive Microarchitectural Optimizations to Improve Performance and Security of Multi-Core Architectures
With the current technological barriers, microarchitectural optimizations are increasingly important to ensure performance scalability of computing systems. The shift to multi-core architectures increases the demands on the memory system, and amplifies the role of microarchitectural optimizations in performance improvement. In a multi-core system, microarchitectural resources are usually shared, such as the cache, to maximize utilization but sharing can also lead to contention and lower performance. This can be mitigated through partitioning of shared caches.However, microarchitectural optimizations which were assumed to be fundamentally secure for a long time, can be used in side-channel attacks to exploit secrets, as cryptographic keys. Timing-based side-channels exploit predictable timing variations due to the interaction with microarchitectural optimizations during program execution. Going forward, there is a strong need to be able to leverage microarchitectural optimizations for performance without compromising security. This thesis contributes with three adaptive microarchitectural resource management optimizations to improve security and/or\ua0performance\ua0of multi-core architectures\ua0and a systematization-of-knowledge of timing-based side-channel attacks.\ua0We observe that to achieve high-performance cache partitioning in a multi-core system\ua0three requirements need to be met: i) fine-granularity of partitions, ii) locality-aware placement and iii) frequent changes. These requirements lead to\ua0high overheads for current centralized partitioning solutions, especially as the number of cores in the\ua0system increases. To address this problem, we present an adaptive and scalable cache partitioning solution (DELTA) using a distributed and asynchronous allocation algorithm. The\ua0allocations occur through core-to-core challenges, where applications with larger performance benefit will gain cache capacity. The\ua0solution is implementable in hardware, due to low computational complexity, and can scale to large core counts.According to our analysis, better performance can be achieved by coordination of multiple optimizations for different resources, e.g., off-chip bandwidth and cache, but is challenging due to the increased number of possible allocations which need to be evaluated.\ua0Based on these observations, we present a solution (CBP) for coordinated management of the optimizations: cache partitioning, bandwidth partitioning and prefetching.\ua0Efficient allocations, considering the inter-resource interactions and trade-offs, are achieved using local resource managers to limit the solution space.The continuously growing number of\ua0side-channel attacks leveraging\ua0microarchitectural optimizations prompts us to review attacks and defenses to understand the vulnerabilities of different microarchitectural optimizations. We identify the four root causes of timing-based side-channel attacks: determinism, sharing, access violation\ua0and information flow.\ua0Our key insight is that eliminating any of the exploited root causes, in any of the attack steps, is enough to provide protection.\ua0Based on our framework, we present a systematization of the attacks and defenses on a wide range of microarchitectural optimizations, which highlights their key similarities.\ua0Shared caches are an attractive attack surface for side-channel attacks, while defenses need to be efficient since the cache is crucial for performance.\ua0To address this issue, we present an adaptive and scalable cache partitioning solution (SCALE) for protection against cache side-channel attacks. The solution leverages randomness,\ua0and provides quantifiable and information theoretic security guarantees using differential privacy. The solution closes the performance gap to a state-of-the-art non-secure allocation policy for a mix of secure and non-secure applications
TPPD: Targeted Pseudo Partitioning based Defence for cross-core covert channel attacks
Contemporary computing employs cache hierarchy to fill the speed gap between processors and main memories. In order to optimise system performance, Last Level Caches (LLC) are shared among all the cores. Cache sharing has made them an attractive surface for cross-core timing channel attacks. In these attacks, an attacker running on another core can exploit the access timing of the victim process to infiltrate the secret information. One such attack is called a cross-core Covert Channel Attack (CCA). Timely detection and then prevention of cross-core CCA is critical for maintaining the integrity and security of users, especially in a shared computing environment. In this work, we have proposed an efficient cross-core CCA mitigation technique. We propose a way-wise cache partitioning on targeted sets, only for the processes suspected to be attackers. In this way, the performance impact on the entire LLC is minimised, and benign applications can utilise the LLC to its full capacity. We have used a cycle-accurate simulator (gem5) to analyse the performance of the proposed method and its security effectiveness. It has been successful in abolishing the cross-core covert timing channel attack with no significant performance impact on benign applications. It causes 23% less cache misses in comparison to existing partitioning based solutions while requiring ≈0.26% storage overhead