628 research outputs found
Improving write performance by enhancing internal parallelism of Solid State Drives
AbstractโMost researches of Solid State Drives (SSDs) archi-tectures rely on Flash Translation Layer (FTL) algorithms and wear-leveling; however, internal parallelism in Solid State Drives has not been well explored. In this research, we proposed a new strategy to improve SSD write performance by enhancing internal parallelism inside SSDs. A SDRAM buffer is added in the design for buffering and scheduling write requests. Because the same logical block numbers may be translated to different physical numbers at different times in FTL, the on-board SDRAM buffer is used to buffer requests at the lower level of FTL. When the buffer is full, same amount of data will be assigned to each storage package in SSDs to enhance internal parallelism. To accurately evaluate performance, we use both synthetic workloads and real-world applications in experiments. We compare the enhanced internal parallelism scheme with the traditional LRU strategy since it is unfair to compare an SSD having buffer with an SSD without a buffer. The simulation results demonstrate that the writing performance of our design is significantly improved compared with the LRU-cache strategy with the same amount of buffer sizes. I
B+-tree Index Optimization by Exploiting Internal Parallelism of Flash-based Solid State Drives
Previous research addressed the potential problems of the hard-disk oriented
design of DBMSs of flashSSDs. In this paper, we focus on exploiting potential
benefits of flashSSDs. First, we examine the internal parallelism issues of
flashSSDs by conducting benchmarks to various flashSSDs. Then, we suggest
algorithm-design principles in order to best benefit from the internal
parallelism. We present a new I/O request concept, called psync I/O that can
exploit the internal parallelism of flashSSDs in a single process. Based on
these ideas, we introduce B+-tree optimization methods in order to utilize
internal parallelism. By integrating the results of these methods, we present a
B+-tree variant, PIO B-tree. We confirmed that each optimization method
substantially enhances the index performance. Consequently, PIO B-tree enhanced
B+-tree's insert performance by a factor of up to 16.3, while improving
point-search performance by a factor of 1.2. The range search of PIO B-tree was
up to 5 times faster than that of the B+-tree. Moreover, PIO B-tree
outperformed other flash-aware indexes in various synthetic workloads. We also
confirmed that PIO B-tree outperforms B+-tree in index traces collected inside
the Postgresql DBMS with TPC-C benchmark.Comment: VLDB201
LSM-tree based Database System Optimization using Application-Driven Flash Management
ํ์๋
ผ๋ฌธ(์์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ,2019. 8. ์ผํ์.Modern data centers aim to take advantage of high parallelism in storage de-
vices for I/O intensive applications such as storage servers, cache systems, and
key-value stores. Key-value stores are the most typical applications that should
provide a highly reliable service with high-performance. To increase the I/O
performance of key-value stores, many data centers have actively adopted next-
generation storage devices such as Non-Volatile Memory Express (NVMe) based
Solid State Devices (SSDs). NVMe SSDs and its protocol are characterized to
provide a high degree of parallelism. However, they may not guarantee pre-
dictable performance while providing high performance and parallelism. For
example, heavily mixed read and write requests can result in performance degra-
dation of throughput and response time due to the interference between the
requests and internal operations (e.g., Garbage Collection (GC)).
To minimize the interference and provide higher performance, this paper
presents IsoKV, an isolation scheme for key-value stores by exploiting internal
parallelism in SSDs. IsoKV manages the level of parallelism of SSD directly by
running application-driven flash management scheme. By storing data with dif-
ferent characteristics in each dedicated internal parallel units of SSD, IsoKV re-
duces interference between I/O requests. We implement IsoKV on RocksDB and
evaluate it using Open-Channel SSD. Our extensive experiments have shown
that IsoKV improves overall throughput and response time on average 1.20ร
and 43% compared with the existing scheme, respectively.์ต์ ๋ฐ์ดํฐ ์ผํฐ๋ ์คํ ๋ฆฌ์ง ์๋ฒ, ์บ์ ์์คํ
๋ฐ Key-Value stores์ ๊ฐ์ I/O
์ง์ฝ์ ์ธ ์ ํ๋ฆฌ์ผ์ด์
์ ์ํ ์คํ ๋ฆฌ์ง ์ฅ์น์ ๋์ ๋ณ๋ ฌ์ฑ์ ํ์ฉํ๋ ๊ฒ์
๋ชฉํ๋ก ํ๋ค. Key-value stores๋ ๊ณ ์ฑ๋ฅ์ ๊ณ ์ ๋ขฐ ์๋น์ค๋ฅผ ์ ๊ณตํด์ผ ํ๋ ๊ฐ์ฅ
๋ํ์ ์ธ ์์ฉํ๋ก๊ทธ๋จ์ด๋ค. Key-value stores์ I/O ์ฑ๋ฅ์ ๋์ด๊ธฐ ์ํด ๋ง์ ๋ฐ
์ดํฐ ์ผํฐ๊ฐ ๋นํ๋ฐ์ฑ ๋ฉ๋ชจ๋ฆฌ ์ต์คํ๋ ์ค(NVMe) ๊ธฐ๋ฐ SSD(Solid State Devices)
์ ๊ฐ์ ์ฐจ์ธ๋ ์คํ ๋ฆฌ์ง ์ฅ์น๋ฅผ ์ ๊ทน์ ์ผ๋ก ์ฑํํ๊ณ ์๋ค. NVMe SSD์ ๊ทธ ํ
๋กํ ์ฝ์ ๋์ ์์ค์ ๋ณ๋ ฌ์ฑ์ ์ ๊ณตํ๋ ๊ฒ์ด ํน์ง์ด๋ค. ๊ทธ๋ฌ๋ NVMe SSD๊ฐ
๋ณ๋ ฌ์ฑ์ ์ ๊ณตํ๋ฉด์๋ ์์ธก ๊ฐ๋ฅํ ์ฑ๋ฅ์ ๋ณด์ฅํ์ง๋ ๋ชปํ ์ ์๋ค. ์๋ฅผ ๋ค์ด
์ฝ๊ธฐ ๋ฐ ์ฐ๊ธฐ ์์ฒญ์ด ๋ง์ด ํผํฉ๋๋ฉด ์์ฒญ๊ณผ ๋ด๋ถ ์์
(์: GC) ์ฌ์ด์ ๊ฐ์ญ์ผ๋ก
์ธํด ์ฒ๋ฆฌ๋ ๋ฐ ์๋ต ์๊ฐ์ ์ฑ๋ฅ ์ ํ๊ฐ ๋ฐ์ํ ์ ์๋ค.
๊ฐ์ญ์ ์ต์ํํ๊ณ ์ฑ๋ฅ์ ํฅ์์ํค๊ธฐ ์ํด ๋ณธ ์ฐ๊ตฌ์์๋ Key-value stores๋ฅผ
์ํ ๊ฒฉ๋ฆฌ ๋ฐฉ์์ธ IsoKV๋ฅผ ์ ์ํ๋ค. IsoKV๋ ์ ํ๋ฆฌ์ผ์ด์
์ค์ฌ ํ๋์ ์ ์ฅ์ฅ
์น ๊ด๋ฆฌ ๋ฐฉ์์ ํตํด SSD์ ๋ณ๋ ฌํ ์์ค์ ์ง์ ๊ด๋ฆฌํ๋ค. IsoKV๋ SSD์ ๊ฐ ์ ์ฉ
๋ด๋ถ ๋ณ๋ ฌ ์ฅ์น์ ์๋ก ๋ค๋ฅธ ํน์ฑ์ ๊ฐ์ง ๋ฐ์ดํฐ๋ฅผ ์ ์ฅํจ์ผ๋ก์จ I/O ์์ฒญ ๊ฐ์
๊ฐ์ญ์ ์ค์ธ๋ค. ๋ํ IsoKV๋ SSD์ LSM ํธ๋ฆฌ ๋ก์ง๊ณผ ๋ฐ์ดํฐ ๊ด๋ฆฌ๋ฅผ ๋๊ธฐํํ
์ฌ GC๋ฅผ ์ ๊ฑฐํ๋ค. ๋ณธ ์ฐ๊ตฌ์์๋ RocksDB๋ฅผ ๊ธฐ๋ฐ์ผ๋ก IsoKV๋ฅผ ๊ตฌํํ์์ผ๋ฉฐ,
Open-Channel SSD๋ฅผ ์ฌ์ฉํ์ฌ ์ฑ๋ฅํ๊ฐํ์๋ค.. ๋ณธ ์ฐ๊ตฌ์ ์คํ ๊ฒฐ๊ณผ์ ๋ฐ๋ฅด๋ฉด
IsoKV๋ ๊ธฐ์กด์ ๋ฐ์ดํฐ ์ ์ฅ ๋ฐฉ์๊ณผ ๋น๊ตํ์ฌ ํ๊ท 1.20ร ๋น ๋ฅด๊ณ ๋ฐ 43% ๊ฐ์๋
์ฒ๋ฆฌ๋๊ณผ ์๋ต์๊ฐ ์ฑ๋ฅ ๊ฐ์ ๊ฒฐ๊ณผ๋ฅผ ์ป์๋ค. ๊ด์ ์์ 43% ๊ฐ์ํ์๋ค.Abstract
Introduction 1
Background 8
Log-Structured Merge tree based Database 8
Open-Channel SSDs 9
Preliminary Experimental Evaluation using oc bench 10
Design and Implementation 14
Overview of IsoKV 14
GC-free flash storage management synchronized with LSM-tree logic 15
I/O type Isolation through Application-Driven Flash Management 17
Dynamic Arrangement of NAND-Flash Parallelism 19
Implementation 21
Evaluation 23
Experimental Setup 23
Performance Evaluation 25
Related Work 31
Conclusion 34
Bibliography 35
์ด๋ก 40Maste
Disaggregating non-volatile memory for throughput-oriented genomics workloads
Massive exploitation of next-generation sequencing technologies requires dealing with both: huge amounts of data and complex bioinformatics pipelines. Computing architectures have evolved to deal with these problems, enabling approaches that were unfeasible years ago: accelerators and Non-Volatile Memories (NVM) are becoming widely used to enhance the most demanding workloads. However, bioinformatics workloads are usually part of bigger pipelines with different and dynamic needs in terms of resources. The introduction of Software Defined Infrastructures (SDI) for data centers provides roots to dramatically increase the efficiency in the management of infrastructures. SDI enables new ways to structure hardware resources through disaggregation, and provides new hardware composability and sharing mechanisms to deploy workloads in more flexible ways. In this paper we study a state-of-the-art genomics application, SMUFIN, aiming to address the challenges of future HPC facilities.This work is partially supported by the European Research Council (ERC) under the EU Horizon 2020 programme (GA 639595), the Spanish Ministry of Economy, Industry and Competitivity (TIN2015-65316-P) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft
Understanding and Optimizing Flash-based Key-value Systems in Data Centers
Flash-based key-value systems are widely deployed in todayโs data centers for providing high-speed data processing services. These systems deploy flash-friendly data structures, such as slab and Log Structured Merge(LSM) tree, on flash-based Solid State Drives(SSDs) and provide efficient solutions in caching and storage scenarios. With the rapid evolution of data centers, there appear plenty of challenges and opportunities for future optimizations.
In this dissertation, we focus on understanding and optimizing flash-based key-value systems from the perspective of workloads, software, and hardware as data centers evolve. We first propose an on-line compression scheme, called SlimCache, considering the unique characteristics of key-value workloads, to virtually enlarge the cache space, increase the hit ratio, and improve the cache performance. Furthermore, to appropriately configure increasingly complex modern key-value data systems, which can have more than 50 parameters with additional hardware and system settings, we quantitatively study and compare five multi-objective optimization methods for auto-tuning the performance of an LSM-tree based key-value store in terms of throughput, the 99th percentile tail latency, convergence time, real-time system throughput, and the iteration process, etc. Last but not least, we conduct an in-depth, comprehensive measurement work on flash-optimized key-value stores with recently emerging 3D XPoint SSDs. We reveal several unexpected bottlenecks in the current key-value store design and present three exemplary case studies to showcase the efficacy of removing these bottlenecks with simple methods on 3D XPoint SSDs. Our experimental results show that our proposed solutions significantly outperform traditional methods. Our study also contributes to providing system implications for auto-tuning the key-value system on flash-based SSDs and optimizing it on revolutionary 3D XPoint based SSDs
- โฆ