Search CORE

14 research outputs found

NVB-tree: Failure-Atomic B+-tree for Persistent Memory

Author: Jin Kibeom
Publication venue: Graduate School of UNIST
Publication date: 01/08/2017
Field of study

Department of Computer EngineeringEmerging non-volatile memory has opened new opportunities to re-design the entire system software stack and it is expected to break the boundaries between memory and storage devices to enable storage-less systems. Traditionally, B-tree has been used to organize data blocks in storage systems. However, B-tree is optimized for disk-based systems that read and write large blocks of data. When byte-addressable non-volatile memory replaces the block device storage systems, the byte-addressability of NVRAM makes it challenge to enforce the failure-atomicity of B-tree nodes. In this work, we present NVB-tree that addresses this challenge, reducing cache line flush overhead and avoiding expensive logging methods. NVB-tree is a hybrid tree that combines the binary search tree and the B+-tree, i.e., keys in each NVB-tree node are stored as a binary search tree so that it can benefit from the byte-addressability of binary search trees. We also present a logging-less split/merge scheme that guarantees failure-atomicity with 8-byte memory writes. Our performance study shows that NVB-tree outperforms the state-of-the-art persistent index - wB+-tree by a large margin.ope

ScholarWorks@UNIST

Design Guidelines for High-Performance SCM Hierarchies

Author: Bugnion Edouard
Daglis Alexandros
Falsafi Babak
Picorel Javier
Pnevmatikatos Dionisios
Sutherland Mark
Ustiugov Dmitrii
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/03/2019
Field of study

With emerging storage-class memory (SCM) nearing commercialization, there is evidence that it will deliver the much-anticipated high density and access latencies within only a few factors of DRAM. Nevertheless, the latency-sensitive nature of memory-resident services makes seamless integration of SCM in servers questionable. In this paper, we ask the question of how best to introduce SCM for such servers to improve overall performance/cost over existing DRAM-only architectures. We first show that even with the most optimistic latency projections for SCM, the higher memory access latency results in prohibitive performance degradation. However, we find that deployment of a modestly sized high-bandwidth 3D stacked DRAM cache makes the performance of an SCM-mostly memory system competitive. The high degree of spatial locality that memory-resident services exhibit not only simplifies the DRAM cache's design as page-based, but also enables the amortization of increased SCM access latencies and the mitigation of SCM's read/write latency disparity. We identify the set of memory hierarchy design parameters that plays a key role in the performance and cost of a memory system combining an SCM technology and a 3D stacked DRAM cache. We then introduce a methodology to drive provisioning for each of these design parameters under a target performance/cost goal. Finally, we use our methodology to derive concrete results for specific SCM technologies. With PCM as a case study, we show that a two bits/cell technology hits the performance/cost sweet spot, reducing the memory subsystem cost by 40% while keeping performance within 3% of the best performing DRAM-only system, whereas single-level and triple-level cell organizations are impractical for use as memory replacements.Comment: Published at MEMSYS'1

arXiv.org e-Print Archive

Crossref

데이터 집약적 응용을 위한 프로그램 컨텍스트 기반의 I/O 최적화

Author: 진용석
Publication venue: 서울대학교 대학원
Publication date: 01/08/2019
Field of study

학위논문(석사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2019. 8. 김지홍.오늘날에는 다양한 형태의 데이터 집약적인 응용이 활용되고 있다. 이러한 응용들은 대용량의 데이터를 분석하거나, 데이터를 구조화하여 스토리지에 저장하는 등 많은 I/O를 발생시켜, 시스템이 I/O를 수행하는 속도에 따라 성능에 큰 영향을 받게 된다. 운영체제는 메인 메모리보다 성능이 크게 떨어지는 저장 장치로의 접근을 최소화하여 파일 I/O의 성능을 극대화하고자 메인 메모리의 일부를 페이지 캐시로 할당한다. 하지만 메모리의 크기는 저장 장치에 비해 크게 제한되어 있어, 파일 I/O의 성능을 높이기 위해서는 앞으로 참조되는 데이터를 잘 보관하고 참조되지 않을 데이터를 캐시로부터 내보내며 효율적으로 관리하는 것이 매우 중요하다. 하지만 어떤 데이터가 앞으로 참조될지, 그리고 어떤 데이터가 참조되지 않을지에 대해서 시스템이 자체적으로 완벽하게 예측하는 것은 불가능하다. 따라서, 시스템보다 상위 계층에서의 최적화를 위한 노력 없이는 I/O 최적화에 있어 명백한 한계가 존재한다. 본 논문에서는 응용이 I/O를 수행하는 맥락, 즉 프로그램 컨텍스트를 기반으로 I/O가 발생하는 시점과 그 패턴을 자동으로 파악하여 분석하는 기법과, 이를 통해 분석한 결과를 기반으로 하여 각각의 I/O가 발생한 프로그램 컨텍스트에 적용할 최적화 방안 추천을 자동화하는 기법을 제안한다. 이를 통해 시스템에서 자체적으로 파악할 수 없는 다양한 힌트를 사전에 제공하고, 이 정보를 시스템이 적극적으로 활용하여 이전보다 효율적인 I/O를 수행할 수 있도록 한다.Many kinds of data intensive applications are broadly utilized nowadays. These applications generate a lot of I/O such as analyzing a large amount of data, structuring the data and storing it in the storage, and the performance is greatly influenced by the speed of the I/O the system performs. The operating system allocates a portion of main memory to the page cache to maximize the performance of file I/O by minimizing access to the storage device which is much lower in performance than main memory. However, since the size of memory is limited compared to the size of the storage device, it is very important to keep the data to be referenced to in future and to export the data not to be referenced from the cache and to manage efficiently to improve the performance of the file I/O. However, it is impossible for the system to predict perfectly about which data will be referenced in the future and which data will not be. Thus, without I/O optimization at the application level, there is a clear limit to performance improvement. In this thesis, we propose a method to automatically detect and analyze I/O characteristics based on I/O program contexts of which an application executes I/O. We propose a technique to automate the optimization recommendation to be applied to the program context in which I/O occurs. Through this, the application can provide various hints to the system that can not be grasped by the system itself, and the system actively reflects this information so that I/O can be performed faster and resources can be used more efficiently than before.제 1 장 서 론 1 제 1 절 연구의 배경 1 제 2 절 연구의 목적 및 기여 4 제 3 절 논문 구성 8 제 2 장 관련 연구 9 제 1 절 프로그램 컨텍스트를 활용한 버퍼 캐싱 9 제 2 절 프로그램 컨텍스트 기반의 데이터 분리 기법 13 제 3 장 프로그램 컨텍스트에 기반한 응용 I/O 분석 19 제 1 절 프로그램 컨텍스트의 정의와 추출 방법 19 제 2 절 PCStat: 프로그램 컨텍스트에 따른 I/O 패턴 분석 22 제 3 절 I/O 쓰레드 환경을 위한 프로그램 컨텍스트의 추출 기법 28 제 4 장 프로그램 컨텍스트에 기반한 I/O 최적화 적용 30 제 1 절 페이지 캐시에 제공하는 힌트 30 제 2 절 fadvise 적용을 통한 프로그램 컨텍스트 기반의 I/O 최적화 32 제 3 절 PCAdvisor: 프로그램 컨텍스트 기반의 I/O 최적화 자동화 35 제 5 장 평가 실험 38 제 1 절 실험 환경 38 제 2 절 실험 결과 39 제 6 장 결 론 44 제 1 절 결론 및 향후 계획 44 참고문헌 46 Abstract 49Maste

SNU Open Repository and Archive

The Inherent Cost of Remembering Consistently

Author: Cohen Nachshon
Guerraoui Rachid
Zablotchi Mihail Igor
Publication venue
Publication date: 20/11/2017
Field of study

Non-volatile memory (NVM) promises fast, byte-addressable and durable storage, with raw access latencies in the same order of magnitude as DRAM. But in order to take advantage of the durability of NVM, programmers need to design persistent objects which maintain consistent state across system crashes and restarts. Concurrent implementations of persistent objects typically make heavy use of expensive persistent fence instructions to order NVM accesses, thus negating some of the performance benefits of NVM. This raises the question of the minimal number of persistent fence instructions required to implement a persistent object. We answer this question in the deterministic lock-free case by providing lower and upper bounds on the required number of fence instructions. We obtain our upper bound by presenting a new universal construction that implements durably any object using at most one persistent fence per update operation invoked. Our lower bound states that in the worst case, each process needs to issue at least one persistent fence per update operation invoked

Infoscience - École polytechnique fédérale de Lausanne

Bridging the Gap between Application and Solid-State-Drives

Author: Zhou Jian
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2018
Field of study

Data storage is one of the important and often critical parts of the computing system in terms of performance, cost, reliability, and energy. Numerous new memory technologies, such as NAND flash, phase change memory (PCM), magnetic RAM (STT-RAM) and Memristor, have emerged recently. Many of them have already entered the production system. Traditional storage optimization and caching algorithms are far from optimal because storage I/Os do not show simple locality. To provide optimal storage we need accurate predictions of I/O behavior. However, the workloads are increasingly dynamic and diverse, making the long and short time I/O prediction challenge. Because of the evolution of the storage technologies and the increasing diversity of workloads, the storage software is becoming more and more complex. For example, Flash Translation Layer (FTL) is added for NAND-flash based Solid State Disks (NAND-SSDs). However, it introduces overhead such as address translation delay and garbage collection costs. There are many recent studies aim to address the overhead. Unfortunately, there is no one-size-fits-all solution due to the variety of workloads. Despite rapidly evolving in storage technologies, the increasing heterogeneity and diversity in machines and workloads coupled with the continued data explosion exacerbate the gap between computing and storage speeds. In this dissertation, we improve the data storage performance from both top-down and bottom-up approach. First, we will investigate exposing the storage level parallelism so that applications can avoid I/O contentions and workloads skew when scheduling the jobs. Second, we will study how architecture aware task scheduling can improve the performance of the application when PCM based NVRAM are equipped. Third, we will develop an I/O correlation aware flash translation layer for NAND-flash based Solid State Disks. Fourth, we will build a DRAM-based correlation aware FTL emulator and study the performance in various filesystems

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Memory Subsystems for Security, Consistency, and Scalability

Author: Hsu Terry Ching-Hsiang
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2018
Field of study

In response to the continuous demand for the ability to process ever larger datasets, as well as discoveries in next-generation memory technologies, researchers have been vigorously studying memory-driven computing architectures that shall allow data-intensive applications to access enormous amounts of pooled non-volatile memory. As applications continue to interact with increasing amounts of components and datasets, existing systems struggle to eÿciently enforce the principle of least privilege for security. While non-volatile memory can retain data even after a power loss and allow for large main memory capacity, programmers have to bear the burdens of maintaining the consistency of program memory for fault tolerance as well as handling huge datasets with traditional yet expensive memory management interfaces for scalability. Today’s computer systems have become too sophisticated for existing memory subsystems to handle many design requirements. In this dissertation, we introduce three memory subsystems to address challenges in terms of security, consistency, and scalability. Specifcally, we propose SMVs to provide threads with fne-grained control over access privileges for a partially shared address space for security, NVthreads to allow programmers to easily leverage nonvolatile memory with automatic persistence for consistency, and PetaMem to enable memory-centric applications to freely access memory beyond the traditional process boundary with support for memory isolation and crash recovery for security, consistency, and scalability

Purdue E-Pubs