1,157 research outputs found

    Fault- and Yield-Aware On-Chip Memory Design and Management

    Get PDF
    Ever decreasing device size causes more frequent hard faults, which becomes a serious burden to processor design and yield management. This problem is particularly pronounced in the on-chip memory which consumes up to 70% of a processor' s total chip area. Traditional circuit-level techniques, such as redundancy and error correction code, become less effective in error-prevalent environments because of their large area overhead. In this work, we suggest an architectural solution to building reliable on-chip memory in the future processor environment. Our approaches have two parts, a design framework and architectural techniques for on-chip memory structures. Our design framework provides important architectural evaluation metrics such as yield, area, and performance based on low level defects and process variations parameters. Processor architects can quickly evaluate their designs' characteristics in terms of yield, area, and performance. With the framework, we develop architectural yield enhancement solutions for on-chip memory structures including L1 cache, L2 cache and directory memory. Our proposed solutions greatly improve yield with negligible area and performance overhead. Furthermore, we develop a decoupled yield model of compute cores and L2 caches in CMPs, which show that there will be many more L2 caches than compute cores in a chip. We propose efficient utilization techniques for excess caches. Evaluation results show that excess caches significantly improve overall performance of CMPs

    Balancing reliability, cost, and performance tradeoffs with FreeFault

    Full text link
    Abstract—Memory errors have been a major source of system failures and fault rates may rise even further as memory continues to scale. This increasing fault rate, especially when combined with advent of integrated on-package memories, may exceed the capabilities of traditional fault tolerance mecha-nisms or significantly increase their overhead. In this paper, we present FreeFault as a hardware-only, transparent, and nearly-free resilience mechanism that is implemented entirely within a processor and can tolerate the majority of DRAM faults. FreeFault repurposes portions of the last-level cache for storing retired memory regions and augments a hardware memory scrubber to monitor memory health and aid retirement decisions. Because it relies on existing structures (cache associativity) for retirement/remapping type repair, FreeFault has essentially no hardware overhead. Because it requires a very modest portion of the cache (as small as 8KB) to cover a large fraction of DRAM faults, FreeFault has almost no impact on performance. We explain how FreeFault adds an attractive layer in an overall resilience scheme of highly-reliable and highly-available systems by delaying, and even entirely avoiding, calling upon software to make tradeoff decisions between memory capacity, performance, and reliability. I

    Impulse: Memory System Support for Scientific Applications

    Get PDF

    Impulse: building a smarter memory controller

    Get PDF
    Journal ArticleImpulse is a new memory system architecture that adds two important features to a traditional memory controller. First, Impulse supports application-specific optimizations through configurable physical address remapping. By remapping physical addresses, applications control how their data is accessed and cached, improving their cache and bus utilization. Second, Impulse supports prefetching at the memory controller, which can hide much of the latency of DRAM accesses. In this paper we describe the design of the Impulse architecture, and show how an Impulse memory system can be used to improve the performance of memory-bound programs. For the NAS conjugate gradient benchmark, Impulse improves performance by 67%. Because it requires no modification to processor, cache, or bus designs, Impulse can be adopted in conventional systems. In addition to scientific applications, we expect that Impulse will benefit regularly strided, memory-bound applications of commercial importance, such as database and multimedia programs

    타임 윈도우 카운터를 활용한 로우 해머링 방지 및 주기억장치 성능 향상

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 융합과학기술대학원 융합과학부(지능형융합시스템전공), 2020. 8. 안정호.Computer systems using DRAM are exposed to row-hammer (RH) attacks, which can flip data in a DRAM row without directly accessing a row but by frequently activating its adjacent ones. There have been a number of proposals to prevent RH, including both probabilistic and deterministic solutions. However, the probabilistic solutions provide protection with no capability to detect attacks and have a non-zero probability for missing protection. Otherwise, counter-based deterministic solutions either incur large area overhead or suffer from noticeable performance drop on adversarial memory access patterns. To overcome these challenges, we propose a new counter-based RH prevention solution named Time Window Counter (TWiCe) based row refresh, which accurately detects potential RH attacks only using a small number of counters with a minimal performance impact. We first make a key observation that the number of rows that can cause RH is limited by the maximum values of row activation frequency and DRAM cell retention time. We calculate the maximum number of required counter entries per DRAM bank, with which TWiCe prevents RH with a strong deterministic guarantee. TWiCe incurs no performance overhead on normal DRAM operations and less than 0.7% area and energy overheads over contemporary DRAM devices. Our evaluation shows that TWiCe makes no more than 0.006% of additional DRAM row activations for adversarial memory access patterns, including RH attack scenarios. To reduce the area and energy overhead further, we propose the threshold adjusted rank-level TWiCe. We first introduce pseudo-associative TWiCe (pa-TWiCe) that can search for hundreds of TWiCe table entries energy-efficiently. In addition, by exploiting pa-TWiCe structure, we propose rank-level TWiCe that reduces the number of required entries further by managing the table entries at a rank-level. We also adjust the thresholds of TWiCe to reduce the number of entries without the increase of false-positive detection on general workloads. Finally, we propose extend TWiCe as a hot-page detector to improve main-memory performance. TWiCe table contains the row addresses that have been frequently activated recently, and they are likely to be activated again due to temporal locality in memory accesses. We show how the hot-page detection in TWiCe can be combined with a DRAM page swap methodology to reduce the DRAM latency for the hot pages. Also, our evaluation shows that low-latency DRAM using TWiCe achieves up to 12.2% IPC improvement over a baseline DDR4 device for a multi-threaded workload.DRAM을 주기억장치로 사용하는 컴퓨터 시스템은 로우 해머링 공격에 노출된다. 로우 해머링은 인접 DRAM 로우를 자주 activation함으로써 특정 DRAM 로우 데이터에 직접 접근하지 않고서도 데이터를 뒤집을 수 있는 현상을 말한다. 이러한 로우 해머링 현상을 방지하기 위해 여러가지 확률적인 방지 기법과 결정론적 방지 기법들이 연구되어 왔다. 그러나, 확률적인 방지 기법은 공격 자체를 탐지할 수 없고, 방지에 실패할 확률이 0이 아니라는 한계가 있다. 또한 기존의 카운터를 활용한 결정론적 방지 기법들은 큰 칩 면적 비용을 발생시키거나 특정 메모리 접근 패턴에서 현저한 성능 하락을 야기한다는 단점이 있다. 이러한 문제를 해결하기 위해, 우리는 TWiCe (Time Window Counter based row refresh)라는 새로운 카운터 기반 결정론적 방지 기법을 제안한다. TWiCe는 적은 수의 카운터를 활용하여 로우 해머링 공격을 정확하게 탐지하면서도 성능에 악영향을 최소화하는 방법이다. 우리는 DRAM 타이밍 파라미터에 의해 로우 activation 빈도가 제한되고 DRAM 셀이 주기적으로 리프레시 되기 때문에 로우 해머링을 야기할 수 있는 DRAM 로우의 수가 한정된다는 사실에 주목하였다. 이로부터 우리는 TWiCe가 확실한 결정론적 방지를 보장할 경우 필요한 DRAM 뱅크 당 필요한 카운터 수의 최대값을 구하였다. TWiCe는 일반적인 DRAM 동작 과정에서는 성능에 아무런 영향을 미치지 않으며, 현대 DRAM 디바이스에서 0.7% 이하의 칩 면적 증가 및 에너지 증가만을 필요로 한다. 우리가 진행한 평가에서 TWiCe는 로우 해머링 공격 시나리오를 포함한 여러가지 메모리 접근 패턴에서 0.006% 이하의 추가적인 DRAM activation을 요구하였다. 또한 TWiCe의 칩 면적 및 에너지 비용을 더욱 줄이기 위하여, 우리는 threshold가 조정된 랭크 단위 TWiCe를 제안한다. 먼저, 수백개가 넘는 TWiCe 테이블 항목 검색을 에너지 효율적으로 수행할 수 있는 pa-TWiCe (pseudo-associatvie TWiCe)를 제안하였다. 그리고, 테이블 항목을 랭크 단위로 관리하여 필요한 테이블 항목의 수를 더욱 줄인 랭크 단위 TWiCe를 제안하였다. 또한, 우리는 TWiCe의 threshold 값을 조절함으로써 일반적인 워크로드 상에서 거짓 양성(false-positive) 탐지를 증가시키지 않는 선에서 TWiCe의 테이블 항목 수를 더욱 줄였다. 마지막으로, 우리는 컴퓨터 시스템의 주기억장치 성능 향상을 위해 TWiCe를 hot-page 감지기로 사용하는 것을 제안한다. 메모리 접근의 시간적 지역성에 의해 최근 자주 activation된 DRAM 로우들은 다시 activation될 확률이 높고, TWiCe는 최근 자주 activation된 DRAM 로우에 대한 정보를 가지고 있다. 이러한 사실에 기반하여, 우리는 hot-page에 대한 DRAM 접근 지연시간을 줄이는 DRAM 페이지 스왑(swap) 기법들에 TWiCe를 적용하는 방법을 보인다. 우리가 수행한 평가에서 TWiCe를 사용한 저지연시간 DRAM은 멀티 쓰레딩 워크로드들에서 기존 DDR4 디바이스 대비 IPC를 최대 12.2% 증가시켰다.Introduction 1 1.1 Time Window Counter Based Row Refresh to Prevent Row-hammering 2 1.2 Optimizing Time Window Counter 6 1.3 Using Time Window Counters to Improve Main Memory Performance 8 1.4 Outline 10 Background of DRAM and Row-hammering 11 2.1 DRAM Device Organization 12 2.2 Sparing DRAM Rows to Combat Reliability Challenges 13 2.3 Main Memory Subsystem Organization and Operation 14 2.4 Row-hammering (RH) 18 2.5 Previous RH Prevention Solutions 20 2.6 Limitations of the Previous RH Solutions 21 TWiCe: Time Window Counter based RH Prevention 26 3.1 TWiCe: Time Window Counter 26 3.2 Proof of RH Prevention 30 3.3 Counter Table Size 33 3.4 Architecting TWiCe 35 3.4.1 Location of TWiCe Table 35 3.4.2 Augmenting DRAM Interface with a New Adjacent Row Refresh (ARR) Command 37 3.5 Analysis 40 3.6 Evaluation 42 Optimizing TWiCe to Reduce Implementation Cost 47 4.1 Pseudo-associative TWiCe 47 4.2 Rank-level TWiCe 50 4.3 Adjusting Threshold to Reduce Table Size 55 4.4 Analysis 57 4.5 Evaluation 59 Augmenting TWiCe for Hot-page Detection 62 5.1 Necessity of Counters for Detecting Hot Pages 62 5.2 Previous Studies on Migration for Asymmetric Low-latency DRAM 64 5.3 Extending TWiCe for Dynamic Hot-page Detection 67 5.4 Additional Components and Methodology 70 5.5 Analysis and Evaluation 73 5.5.1 Overhead Analysis 73 5.5.2 Evaluation 75 Conclusion 82 6.1 Future work 84 Bibliography 85 국문초록 94Docto

    Um framework para a avaliação de segurança de hardware

    Get PDF
    Orientador: Ricardo DahabDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O hardware de sistemas computacionais possui uma função crítica na segurança de sistemas operacionais e aplicativos. Além de prover funcionalidades-padrão, tal como o nível de privilégio de execução, o hardware também pode oferecer suporte a criptografia, boot seguro, execução segura, e outros. Com o fim de garantir que essas funcionalidades de segurança irão operar corretamente quando juntas dentro de um sistema, e de que o sistema é seguro como um todo, é necessário avaliar a segurança da arquitetura de todo sistema, durante o ciclo de desenvolvimento do hardware. Neste trabalho, iniciamos pela pesquisa dos diferentes tipos existentes de vulnerabilidades de hardware, e propomos uma taxonomia para classificá-los. Nossa taxonomia é capaz de classificar as vulnerabilidades de acordo com o ponto no qual elas foram inseridas, dentro do ciclo de desenvolvimento. Ela também é capaz de separar as vulnerabilidades de hardware daquelas de software que apenas se aproveitam de funcionalidades-padrão do hardware. Focando em um tipo específico de vulnerabilidade - aquelas relacionadas à arquitetura - apresentamos um método para a avaliação de sistemas de hardware utilizando a metodologia de Assurance Cases. Essa metodologia tem sido usada com sucesso para a análise de segurança física e, tanto quanto saibamos, não há notícias de seu uso para a análise de segurança de hardware. Utilizando esse método, pudemos identificar corretamente as vulnerabilidades de sistemas reais. Por fim, apresentamos uma prova de conceito de uma ferramenta para guiar e automatizar parte do processo de análise que foi proposto. A partir de uma descrição padronizada de uma arquitetura de hardware, a ferramenta aplica uma série de regras de um sistema especialista e gera um relatório de Assurance Case com as possíveis vulnerabilidades do sistema-alvo. Aplicamos a ferramenta aos sistemas estudados e pudemos identificar com sucesso as vulnerabilidades conhecidas, assim como outras possíveis vulnerabilidadesAbstract: The hardware of computer systems plays a critical role in the security of operating systems and applications. Besides providing standard features such as execution privilege levels, it may also offer support for encryption, secure execution, secure boot, and others. In order to guarantee that these security features work correctly when inside a system, and that the system is secure as a whole, it is necessary to evaluate the security of the architecture during the hardware development life-cycle. In this work, we start by exploring the different types of existing hardware vulnerabilities and propose a taxonomy for classifying them. Our taxonomy is able to classify vulnerabilities according to when they were created during the development life-cycle, as well as separating real hardware vulnerabilities from software vulnerabilities that leverage standard hardware features. Focusing on a specific type of vulnerability - the architecture-related ones, we present a method for evaluating hardware systems using the Assurance Case methodology. This methodology has been used successfully for safety analysis, and to our best knowledge there are no reports of its use for hardware security analysis. Using this method, we were able to correctly identify the vulnerabilities of real-world systems. Lastly, we present the proof-of-concept of a tool for guiding and automating part of the proposed analysis methodology. Starting from a standardized hardware architecture description, the tool applies a set of expert system rules, and generates an Assurance Case report that contains the possible security vulnerabilities of a system. We were able to apply the tool to the studied systems, and correctly identify their known vulnerabilities, as well as other possible vulnerabilitiesMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    Design space exploration and optimization of path oblivious RAM in secure processors

    Get PDF
    Keeping user data private is a huge problem both in cloud computing and computation outsourcing. One paradigm to achieve data privacy is to use tamper-resistant processors, inside which users' private data is decrypted and computed upon. These processors need to interact with untrusted external memory. Even if we encrypt all data that leaves the trusted processor, however, the address sequence that goes off-chip may still leak information. To prevent this address leakage, the security community has proposed ORAM (Oblivious RAM). ORAM has mainly been explored in server/file settings which assume a vastly different computation model than secure processors. Not surprisingly, naïvely applying ORAM to a secure processor setting incurs large performance overheads. In this paper, a recent proposal called Path ORAM is studied. We demonstrate techniques to make Path ORAM practical in a secure processor setting. We introduce background eviction schemes to prevent Path ORAM failure and allow for a performance-driven design space exploration. We propose a concept called super blocks to further improve Path ORAM's performance, and also show an efficient integrity verification scheme for Path ORAM. With our optimizations, Path ORAM overhead drops by 41.8%, and SPEC benchmark execution time improves by 52.4% in relation to a baseline configuration. Our work can be used to improve the security level of previous secure processors.National Science Foundation (U.S.). Graduate Research Fellowship Program (Grant 1122374)American Society for Engineering Education. National Defense Science and Engineering Graduate FellowshipUnited States. Defense Advanced Research Projects Agency (Clean-slate design of Resilient, Adaptive, Secure Hosts Contract N66001-10-2-4089
    corecore