Search CORE

6 research outputs found

Improving Phase Change Memory Performance with Data Content Aware Access

Author: Ahn S. J.
Alshboul M.
Awad A.
Awad A.
Bock S.
Bock S.
Bondurant D.
Boroumand A.
Burr G. W.
Chen J.
Chhabra S.
Dogan H.
Du Y.
Ferreira A. P.
Frigo P.
Gueron S.
Guerra J.
Ham T. J.
Hashemi M.
Hsieh K.
Hwang W.
Jia Y.
Jiang L.
Joo Y.
Kang U.
Karlsson M.
Kim J.
Kim Y.
Kim Y.
Lalam A.
Lam C. H.
Lee J. I.
Mallik A.
Marathe V. J.
Meza J.
Morikawa T.
Mutlu O.
Mutlu O.
Pourshirazi B.
Qureshi M. K.
Qureshi M. K.
Saileshwar G.
Seong N. H.
Seshadri V.
Stuecheli J.
Villa C.
Wang Y.
Wang Z.
Wuttig M.
Yamada N.
Yang J.
Yue J.
Zhang L.
Zhou M.
Zhou M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2020
Field of study

A prominent characteristic of write operation in Phase-Change Memory (PCM) is that its latency and energy are sensitive to the data to be written as well as the content that is overwritten. We observe that overwriting unknown memory content can incur significantly higher latency and energy compared to overwriting known all-zeros or all-ones content. This is because all-zeros or all-ones content is overwritten by programming the PCM cells only in one direction, i.e., using either SET or RESET operations, not both. In this paper, we propose data content aware PCM writes (DATACON), a new mechanism that reduces the latency and energy of PCM writes by redirecting these requests to overwrite memory locations containing all-zeros or all-ones. DATACON operates in three steps. First, it estimates how much a PCM write access would benefit from overwriting known content (e.g., all-zeros, or all-ones) by comprehensively considering the number of set bits in the data to be written, and the energy-latency trade-offs for SET and RESET operations in PCM. Second, it translates the write address to a physical address within memory that contains the best type of content to overwrite, and records this translation in a table for future accesses. We exploit data access locality in workloads to minimize the address translation overhead. Third, it re-initializes unused memory locations with known all-zeros or all-ones content in a manner that does not interfere with regular read and write accesses. DATACON overwrites unknown content only when it is absolutely necessary to do so. We evaluate DATACON with workloads from state-of-the-art machine learning applications, SPEC CPU2017, and NAS Parallel Benchmarks. Results demonstrate that DATACON significantly improves system performance and memory system energy consumption compared to the best of performance-oriented state-of-the-art techniques.Comment: 18 pages, 21 figures, accepted at ACM SIGPLAN International Symposium on Memory Management (ISMM

arXiv.org e-Print Archive

Crossref

Design and Implementation of A High Performance Storage Leveraging the DRAM Host Interface

Author: 서성용
Publication venue: 서울대학교 대학원
Publication date: 01/08/2016
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 8. 민상렬.스토리지는 중앙처리장치, 메인 메모리 등과 함께 컴퓨터 시스템의 성능을 결정하는 주요 요소이다. 컴퓨터 시스템이 처리해야 할 데이터의 양이 지속적으로 증가함에 따라 이를 효과적으로 처리할 수 있는 고성능 컴퓨터 시스템이 요구되고 있다. 이러한 고성능 컴퓨터 시스템을 구성하기 위해서는 고성능 스토리지가 필수적인데, NAND 플래시 메모리를 저장 매체로 사용함에 따라 스토리지의 성능은 비약적으로 발전하면서 고성능 스토리지에 대한 요구를 충족시킬 수 있었다. 스토리지는 스토리지 인터페이스를 통해 호스트 시스템과 연결된다. 스토리지 인터페이스는 호스트 시스템이 스토리지의 성능을 최대로 활용할 수 있도록 스토리지와 함께 발전해왔다. 향후에도 스토리지의 성능 향상에 따라 스토리지 인터페이스 또한 계속해서 변경되고 발전될 것이다. 특히 NAND 플래시 메모리의 급격한 발전과 PRAM, ReRAM 등의 고성능 차세대 메모리의 등장은 스토리지 인터페이스의 발전을 더욱 가속화 할 것으로 예상된다. 한편 컴퓨터 시스템에는 주로 메인 메모리로 사용되는 DRAM을 위한 인터페이스도 존재한다. 메인 메모리의 대역폭과 레이턴시는 전체 컴퓨터 시스템의 성능에 직접적인 영향을 미치므로, DRAM 인터페이스는 스토리지 인터페이스를 포함한 다른 인터페이스에 비해 항상 고성능의 대역폭과 레이턴시를 제공해왔다. 이러한 고성능 DRAM 인터페이스를 스토리지 인터페이스로 활용할 수 있다면 기존의 스토리지 인터페이스에 비해 고속의 인터페이스를 확보할 수 있으며 별도의 스토리지 인터페이스를 개발할 필요가 없다. 그러나 스토리지 인터페이스와 DRAM 인터페이스는 사용하는 매체의 특성에 따라서 상호호환성 없이 발전해왔기 때문에 현재의 스토리지를 그대로 DRAM 인터페이스에 연결하여 사용하는 것은 불가능하다. 이 논문에서는 고속의 DRAM 인터페이스를 스토리지 인터페이스의 물리 계층으로 활용하는 스토리지 아키텍쳐를 설계, 구현, 평가함으로써 제안된 아키텍쳐의 타당성을 증명한다. 제안된 스토리지 아키텍쳐는 기존의 DRAM 인터페이스를 스토리지 인터페이스의 물리 계층으로 활용하기 위해서 DRAM과 동일한 동작 특성을 갖는 작은 크기의 공유 메모리 버퍼를 사용한다. 이 공유 메모리 버퍼는 스토리지에 의해 제공되며 호스트 시스템의 메모리 주소 공간에 매핑된다. DRAM 프로토콜 자체로는 제안된 스토리지 장치를 동작시킬 수 없기 때문에 제안된 스토리지 인터페이스를 기반으로 동작하는 소프트웨어 수준의 신규 스토리지 프로토콜을 정의한다. 제안된 프로토콜을 기반으로 DRAM (LPDDR3) 호스트 인터페이스를 사용하는 스토리지, 호스트 시스템을 설계하고 구현한다. 최종적으로 제안한 스토리지와 호스트 시스템을 결합하여 완전한 Android 시스템을 구성함으로써 본 연구의 타당성을 검증한다. 먼저 구현된 스토리지에 대한 정량적 평가를 통해 신규 스토리지 프로토콜이 매우 낮은 오버헤드를 갖고 있으며 제안된 스토리지가 최신의 UFS 2.0 스토리지와 비교될만한 성능을 확보했음을 보인다. 또한 제안된 스토리지에 대한 정성적 평가를 통해 해당 스토리지를 효과적으로 활용하기 위해 필요한 개선점에 대해 살펴본다Storage is a key factor that determines the overall performance of a computer system. In the era of big data, the demand for high performance computer systems has been ever increasing. High performance storage is also needed in order to construct a high performance computer system. The performance of a storage has increased dramatically with the adoption of NAND fash memory. A storage is connected with a host system via a storage interface. The storage interface has evolved in order to fully exploit the performance of a storage. Its performance will evolve as storage advances. However, we already have faster interface in current computer system: the DRAM interface. It provides up to 25.6 GB/s in case of latest DDR4 specifcation. Since the protocols for storage and DRAM are not compatible, we cannot exploit the DRAM interface as a storage interface as is. In this work, a new storage protocol is proposed in order to turn the DRAM interface to a storage interface. It runs on top of the DRAM interface. This protocol builds on a small host interface buffer structure mapped to the host systems memory space. Given the protocol, a design of storage controller and frmware is proposed. The storage controller natively supports the DRAM (LPDDR3) interface. Also a new host platform including both hardware and software is proposed for the proposed storage since the storage cannot be connected with conventional computer systems. Finally the feasibility of this work is proved by constructing a full Android system running on the developed storage and platform. Evaluation result shows that the proposed storage architecture has very low protocol handling overheads and compares favorably to a latest commercial UFS 2.0 storage.I. 서론 1 1.1 연구 동기 1 1.2 연구 내용 4 1.3 논문의 구성 7 II. 배경 지식 및 관련 연구 10 2.1 스토리지 인터페이스 10 2.1.1 데스크탑/서버 스토리지 인터페이스 11 2.1.2 모바일 스토리지 인터페이스 16 2.2 DRAM 19 2.2.1 DRAM 구조와 특성 20 2.2.2 DRAM 오퍼레이션 21 2.2.3 DRAM DIMM 21 2.3 NAND 플래시 메모리 24 2.3.1 NAND 플래시 메모리의 구조 24 2.3.2 NAND 플래시 메모리 오퍼레이션과 인터페이스 26 2.3.3 플래시 변환 계층 30 2.3.4 플래시 파일 시스템 35 2.4 NVDIMM 장치 37 2.4.1 NVDIMM-N 38 2.4.2 NVDIMM-F 41 2.4.3 NVDIMM-P 47 2.4.4 Intel DIMM 47 2.5 NVDIMM 소프트웨어 48 2.5.1 BIOS 49 2.5.2 Linux 지원 51 2.5.3 Microsoft Windows 지원 52 2.5.4 NVM Programming Model 52 III. 스토리지 인터페이스 설계 56 3.1 물리 계층 56 3.2 스토리지 프로토콜 57 3.3 고려사항 60 3.3.1 명령 실행 완료 전달 60 3.3.2 명령 제출시 무결성 보장 62 3.3.3 데이터 전송 부하 분배 63 IV. 스토리지 설계 65 4.1 컨트롤러 65 4.1.1 호스트 인터페이스 블록 66 4.1.2 백본 (backbone) 블록 68 4.1.3 플래시 메모리 인터페이스 블록 69 4.2 펌웨어 (Firmware) 70 4.2.1 호스트 인터페이스 계층 (Host Interface Layer) 70 4.2.2 플래시 변환 계층 (Flash Translation Layer) 72 4.2.3 플래시 인터페이스 계층 (Flash Interface Layer) 73 V. 호스트 시스템 설계 75 5.1 호스트 시스템 플랫폼 하드웨어 75 5.2 부트 로더 (boot loader) 76 5.3 디바이스 드라이버 77 5.4 소프트웨어 최적화 80 5.4.1 DMA를 이용한 데이터 전송 80 5.4.2 I/O 분할 처리 81 5.4.3 캐시 무효화 비용 최소화 83 5.4.4 데이터 버퍼의 외부 단편화 최소화 83 5.4.5 Linux I/O 스케쥴러 미사용 84 VI. 평가 86 6.1 평가 환경 86 6.2 정량적 평가 88 6.2.1 프로토콜 오버헤드 88 6.2.2 스토리지 인터페이스 버퍼 크기와 최대 전송속도 91 6.2.3 읽기/쓰기 성능 94 6.2.4 성능 비교 95 6.3 정성적 평가 98 6.3.1 DRAM 버스 대역폭 사용량 증가 98 6.3.2 유연한 DRAM 주소 매핑 98 6.3.3 DRAM 인터페이스와의 물리적 접속 101 6.3.4 메모리 트랜잭션 순서 보장 103 6.3.5 부팅 지원 104 6.3.6 고성능 범용 DMA 지원 107 VII. 결론 및 향후 연구 108 7.1 결론 108 7.2 향후 연구 108 참고 문헌 111 Abstract 117Docto

SNU Open Repository and Archive

EFFICIENT SECURITY IN EMERGING MEMORIES

Author: Rakshit Joydeep
Publication venue
Publication date: 25/09/2018
Field of study

The wide adoption of cloud computing has established integrity and confidentiality of data in memory as a first order design concern in modern computing systems. Data integrity is ensured by Merkle Tree (MT) memory authentication. However, in the context of emerging non-volatile memories (NVMs), the MT memory authentication related increase in cell writes and memory accesses impose significant energy, lifetime, and performance overheads. This dissertation presents ASSURE, an Authentication Scheme for SecURE (ASSURE) energy efficient NVMs. ASSURE integrates (i) smart message authentication codes with (ii) multi-root MTs to decrease MT reads and writes, while also reducing the number of cell writes on each MT write. Whereas data confidentiality is effectively ensured by encryption, the memory access patterns can be exploited as a side-channel to obtain confidential data. Oblivious RAM (ORAM) is a secure cryptographic construct that effectively thwarts access-pattern-based attacks. However, in Path ORAM (state-of-the-art efficient ORAM for main memories) and its variants, each last-level cache miss (read or write) is transformed to a sequence of memory reads and writes (collectively termed read phase and write phase, respectively), increasing the number of memory writes due to data re-encryption, increasing effective latency of the memory accesses, and degrading system performance. This dissertation efficiently addresses the challenges of both read and write phase operations during an ORAM access. First, it presents ReadPRO (Read Promotion), which is an efficient ORAM scheduler that leverages runtime identification of read accesses to effectively prioritize the service of critical-path-bound read access read phase operations, while preserving all data dependencies. Second, it presents LEO (Low overhead Encryption ORAM) that reduces cell writes by opportunistically decreasing the number of block encryptions, while preserving the security guarantees of the baseline Path ORAM. This dissertation therefore addresses the core chal- lenges of read/write energy and latency, endurance, and system performance for integration of essential security primitives in emerging memory architectures. Future research directions will focus on (i) exploring efficient solutions for ORAM read phase optimization and secure ORAM resizing, (ii) investigating the security challenges of emerging processing-in-memory architectures, and (iii) investigating the interplay of security primitives with reliability enhancing architectures

D-Scholarship@Pitt

Memory Systems and Interconnects for Scale-Out Servers

Author: Volos Stavros
Publication venue: Lausanne, EPFL
Publication date: 08/09/2015
Field of study

The information revolution of the last decade has been fueled by the digitization of almost all human activities through a wide range of Internet services. The backbone of this information age are scale-out datacenters that need to collect, store, and process massive amounts of data. These datacenters distribute vast datasets across a large number of servers, typically into memory-resident shards so as to maintain strict quality-of-service guarantees. While data is driving the skyrocketing demands for scale-out servers, processor and memory manufacturers have reached fundamental efficiency limits, no longer able to increase server energy efficiency at a sufficient pace. As a result, energy has emerged as the main obstacle to the scalability of information technology (IT) with huge economic implications. Delivering sustainable IT calls for a paradigm shift in computer system design. As memory has taken a central role in IT infrastructure, memory-centric architectures are required to fully utilize the IT's costly memory investment. In response, processor architects are resorting to manycore architectures to leverage the abundant request-level parallelism found in data-centric applications. Manycore processors fully utilize available memory resources, thereby increasing IT efficiency by almost an order of magnitude. Because manycore server chips execute a large number of concurrent requests, they exhibit high incidence of accesses to the last-level-cache for fetching instructions (due to large instruction footprints), and off-chip memory (due to lack of temporal reuse in on-chip caches) for accessing dataset objects. As a result, on-chip interconnects and the memory system are emerging as major performance and energy-efficiency bottlenecks in servers. This thesis seeks to architect on-chip interconnects and memory systems that are tuned for the requirements of memory-centric scale-out servers. By studying a wide range of data-centric applications, we uncover application phenomena common in data-centric applications, and examine their implications on on-chip network and off-chip memory traffic. Finally, we propose specialized on-chip interconnects and memory systems that leverage common traffic characteristics, thereby improving server throughput and energy efficiency

Infoscience - École polytechnique fédérale de Lausanne