Search CORE

2,614 research outputs found

Application-Specific Memory Subsystems

Author: Wingbermuehle Joseph George
Publication venue: Washington University Open Scholarship
Publication date: 15/05/2015
Field of study

The disparity in performance between processors and main memories has led computer architects to incorporate large cache hierarchies in modern computers. These cache hierarchies are designed to be general-purpose in that they strive to provide the best possible performance across a wide range of applications. However, such a memory subsystem does not necessarily provide the best possible performance for a particular application. Although general-purpose memory subsystems are desirable when the work-load is unknown and the memory subsystem must remain fixed, when this is not the case a custom memory subsystem may be beneficial. For example, in an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) designed to run a particular application, a custom memory subsystem optimized for that application would be desirable. In addition, when there are tunable parameters in the memory subsystem, it may make sense to change these parameters depending on the application being run. Such a situation arises today with FPGAs and, to a lesser extent, GPUs, and it is plausible that general-purpose computers will begin to support greater flexibility in the memory subsystem in the future. In this dissertation, we first show that it is possible to create application-specific memory subsystems that provide much better performance than a general-purpose memory subsystem. In addition, we show a way to discover such memory subsystems automatically using a superoptimization technique on memory address traces gathered from applications. This allows one to generate a custom memory subsystem with little effort. We next show that our memory subsystem superoptimization technique can be used to optimize for objectives other than performance. As an example, we show that it is possible to reduce the number of writes to the main memory, which can be useful for main memories with limited write durability, such as flash or Phase-Change Memory (PCM). Finally, we show how to superoptimize memory subsystems for streaming applications, which are a class of parallel applications. In particular, we show that, through the use of ScalaPipe, we can author and deploy streaming applications targeting FPGAs with superoptimized memory subsystems. ScalaPipe is a domain-specific language (DSL) embedded in the Scala programming language for generating streaming applications that can be implemented on CPUs and FPGAs. Using the ScalaPipe implementation, we are able to demonstrate actual performance improvements using the superoptimized memory subsystem with applications implemented in hardware

Washington University St. Louis: Open Scholarship

DR.SGX: Hardening SGX Enclaves against Cache Attacks with Data Location Randomization

Author: Aline Souza (4218802)
Angélica Oliveira Pontes (4218796)
Camila Sanchez (4218811)
Diego Riaño-Pachón (3430481)
Eliane Santana (4218793)
Guilherme Zanini (4218805)
Gustavo Borin (4218808)
Gustavo Goldman (271482)
Juliana Oliveira (3747553)
Renato Santos (4218790)
Roberta Dal’Mas (4218799)
Publication venue
Publication date: 28/09/2017
Field of study

Recent research has demonstrated that Intel's SGX is vulnerable to various software-based side-channel attacks. In particular, attacks that monitor CPU caches shared between the victim enclave and untrusted software enable accurate leakage of secret enclave data. Known defenses assume developer assistance, require hardware changes, impose high overhead, or prevent only some of the known attacks. In this paper we propose data location randomization as a novel defensive approach to address the threat of side-channel attacks. Our main goal is to break the link between the cache observations by the privileged adversary and the actual data accesses by the victim. We design and implement a compiler-based tool called DR.SGX that instruments enclave code such that data locations are permuted at the granularity of cache lines. We realize the permutation with the CPU's cryptographic hardware-acceleration units providing secure randomization. To prevent correlation of repeated memory accesses we continuously re-randomize all enclave data during execution. Our solution effectively protects many (but not all) enclaves from cache attacks and provides a complementary enclave hardening technique that is especially useful against unpredictable information leakage

arXiv.org e-Print Archive

FigShare

Software Grand Exposure: SGX Cache Attacks Are Practical

Author: Brasser Ferdinand
Capkun Srdjan
Dmitrienko Alexandra
Kostiainen Kari
Müller Urs
Sadeghi Ahmad-Reza
Publication venue
Publication date: 24/02/2017
Field of study

Side-channel information leakage is a known limitation of SGX. Researchers have demonstrated that secret-dependent information can be extracted from enclave execution through page-fault access patterns. Consequently, various recent research efforts are actively seeking countermeasures to SGX side-channel attacks. It is widely assumed that SGX may be vulnerable to other side channels, such as cache access pattern monitoring, as well. However, prior to our work, the practicality and the extent of such information leakage was not studied. In this paper we demonstrate that cache-based attacks are indeed a serious threat to the confidentiality of SGX-protected programs. Our goal was to design an attack that is hard to mitigate using known defenses, and therefore we mount our attack without interrupting enclave execution. This approach has major technical challenges, since the existing cache monitoring techniques experience significant noise if the victim process is not interrupted. We designed and implemented novel attack techniques to reduce this noise by leveraging the capabilities of the privileged adversary. Our attacks are able to recover confidential information from SGX enclaves, which we illustrate in two example cases: extraction of an entire RSA-2048 key during RSA decryption, and detection of specific human genome sequences during genomic indexing. We show that our attacks are more effective than previous cache attacks and harder to mitigate than previous SGX side-channel attacks

arXiv.org e-Print Archive

TUbiblio

Worst-case execution time analysis-driven object cache design

Author: Huber Benedikt
Puffitsch Wolfgang
Schoeberl Martin
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

Online Research Database In Technology

A low-power cache system for high-performance processors

Author: Ye Jiongyao
Publication venue
Publication date: 01/01/2011
Field of study

制度:新 ; 報告番号:甲3439号 ; 学位の種類:博士(工学) ; 授与年月日:12-Sep-11 ; 早大学位記番号:新576

Waseda University Repository

Variable-based multi-module data caches for clustered VLIW processors

Author: Abella Ferrer Jaume
Gibert Codina Enric
González Colás Antonio María
Sánchez Navarro Jesús
Vera Rivera Francisco Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Memory structures consume an important fraction of the total processor energy. One solution to reduce the energy consumed by cache memories consists of reducing their supply voltage and/or increase their threshold voltage at an expense in access time. We propose to divide the L1 data cache into two cache modules for a clustered VLIW processor consisting of two clusters. Such division is done on a variable basis so that the address of a datum determines its location. Each cache module is assigned to a cluster and can be set up as a fast power-hungry module or as a slow power-aware module. We also present compiler techniques in order to distribute variables between the two cache modules and generate code accordingly. We have explored several cache configurations using the Mediabench suite and we have observed that the best distributed cache organization outperforms traditional cache organizations by 19%-31% in energy-delay and by 11%-29% in energy-delay. In addition, we also explore a reconfigurable distributed cache, where the cache can be reconfigured on a context switch. This reconfigurable scheme further outperforms the best previous distributed organization by 3%-4%.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Split array and scalar data cache: A comprehensive study of data cache organization.

Author: Naz Afrin
Publication venue: 'University of North Texas Libraries'
Publication date: 01/08/2007
Field of study

Existing cache organization suffers from the inability to distinguish different types of localities, and non-selectively cache all data rather than making any attempt to take special advantage of the locality type. This causes unnecessary movement of data among the levels of the memory hierarchy and increases in miss ratio. In this dissertation I propose a split data cache architecture that will group memory accesses as scalar or array references according to their inherent locality and will subsequently map each group to a dedicated cache partition. In this system, because scalar and array references will no longer negatively affect each other, cache-interference is diminished, delivering better performance. Further improvement is achieved by the introduction of victim cache, prefetching, data flattening and reconfigurability to tune the array and scalar caches for specific application. The most significant contribution of my work is the introduction of novel cache architecture for embedded microprocessor platforms. My proposed cache architecture uses reconfigurability coupled with split data caches to reduce area and power consumed by cache memories while retaining performance gains. My results show excellent reductions in both memory size and memory access times, translating into reduced power consumption. Since there was a huge reduction in miss rates at L-1 caches, further power reduction is achieved by partially or completely shutting down L-2 data or L-2 instruction caches. The saving in cache sizes resulting from these designs can be used for other processor activities including instruction and data prefetching, branch-prediction buffers. The potential benefits of such techniques for embedded applications have been evaluated in my work. I also explore how my cache organization performs for non-numeric data structures. I propose a novel idea called "Data flattening" which is a profile based memory allocation technique to compress sparsely scattered pointer data into regular contiguous memory locations and explore the potentials of my proposed Spit cache organization for data treated with data flattening method

UNT Digital Library

Introducing a Data Sliding Mechanism for Cooperative Caching in Manycore Architectures

Author: Cudennec Loïc
Dahmani Safae
Gogniat Guy
Publication venue: HAL CCSD
Publication date: 20/05/2013
Field of study

International audienceIn this paper, we propose a new cooperative caching method improving the cache miss rate for manycore micro- architec- tures. The work is motivated by some limitations of recent adaptive cooperative caching proposals. Elastic Cooperative caching (ECC), is a dynamic memory partitioning mechanism that allows sharing cache across cooperative nodes according to the application behavior. However, it is mainly limited with cache eviction rate in case of highly stressed neighbor- hood. Another system, the adaptive Set-Granular Cooperative Caching (ASCC), is based on finer set-based mechanisms for a better adaptability. However, heavy localized cache loads are not efficiently managed. In such a context, we propose a cooperative caching strategy that consists in sliding data through closer neighbors. When a cache receives a storing request of a neighbor's private block, it spills the least recently used private data to a close neighbor. Thus, solicited saturated nodes slide local blocks to their respective neighbors to always provide free cache space. We also propose a new Priority- based Data Replacement policy to decide efficiently which blocks should be spilled, and a new mechanism to choose host destination called Best Neighbor selector. The first analytic performance evaluation shows that the proposed cache management policies reduce by half the average global communication rate. As frequent accesses are focused in the neighboring zones, it efficiently improves on-Chip traffic. Finally, our evaluation shows that cache miss rate is en- hanced: each tile keeps the most frequently accessed data 1- Hop close to it, instead of ejecting them Off-Chip. Proposed techniques notably reduce the cache miss rate in case of high solicitation of the cooperative zone, as it is shown in the performed experiments

HAL-Université de Bretagne Occidentale

HAL-CEA