458 research outputs found
Elevating commodity storage with the SALSA host translation layer
To satisfy increasing storage demands in both capacity and performance,
industry has turned to multiple storage technologies, including Flash SSDs and
SMR disks. These devices employ a translation layer that conceals the
idiosyncrasies of their mediums and enables random access. Device translation
layers are, however, inherently constrained: resources on the drive are scarce,
they cannot be adapted to application requirements, and lack visibility across
multiple devices. As a result, performance and durability of many storage
devices is severely degraded.
In this paper, we present SALSA: a translation layer that executes on the
host and allows unmodified applications to better utilize commodity storage.
SALSA supports a wide range of single- and multi-device optimizations and,
because is implemented in software, can adapt to specific workloads. We
describe SALSA's design, and demonstrate its significant benefits using
microbenchmarks and case studies based on three applications: MySQL, the Swift
object store, and a video server.Comment: Presented at 2018 IEEE 26th International Symposium on Modeling,
Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS
HetFS: A heterogeneous file system for everyone
Storage devices have been getting more and more diverse during the last decade. The advent of SSDs made it painfully clear that rotating devices, such as HDDs or magnetic tapes, were lacking in regards to response time. However, SSDs currently have a limited number of write cycles and a significantly larger price per capacity, which has prevented rotational technologies from begin abandoned. Additionally, Non-Volatile Memories (NVMs) have been lately gaining traction, offering devices that typically outperform NAND-based SSDs but exhibit a full new set of idiosyncrasies.
Therefore, in order to appropriately support this diversity, intelligent mechanisms will be needed in the near-future to balance the benefits and drawbacks of each storage technology available to a system. In this paper, we present a first step towards such a mechanism called HetFS, an extension to the ZFS file system that is capable of choosing the storage device a file should be kept in according to preprogrammed filters. We introduce the prototype and show some preliminary results of the effects obtained when placing specific files into different devices.The research leading to these results has received funding from the European
Community under the BIGStorage ETN (Project 642963 of the H2020-MSCA-ITN-2014), by the Spanish Ministry of Economy and Competitiveness under
the TIN2015-65316 grant and by the Catalan Government under the 2014-SGR-
1051 grant. To learn more about the BigStorage project, please visit http:
//bigstorage-project.eu/.Peer ReviewedPostprint (author's final draft
Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency
Persistent memory provides high-performance data persistence at main memory.
Memory writes need to be performed in strict order to satisfy storage
consistency requirements and enable correct recovery from system crashes.
Unfortunately, adhering to such a strict order significantly degrades system
performance and persistent memory endurance. This paper introduces a new
mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering
requirements at significantly lower performance and endurance loss. LOC
consists of two key techniques. First, Eager Commit eliminates the need to
perform a persistent commit record write within a transaction. We do so by
ensuring that we can determine the status of all committed transactions during
recovery by storing necessary metadata information statically with blocks of
data written to memory. Second, Speculative Persistence relaxes the write
ordering between transactions by allowing writes to be speculatively written to
persistent memory. A speculative write is made visible to software only after
its associated transaction commits. To enable this, our mechanism supports the
tracking of committed transaction ID and multi-versioning in the CPU cache. Our
evaluations show that LOC reduces the average performance overhead of memory
persistence from 66.9% to 34.9% and the memory write traffic overhead from
17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and
Distributed System
A Survey on the Integration of NAND Flash Storage in the Design of File Systems and the Host Storage Software Stack
With the ever-increasing amount of data generate in the world, estimated to reach over 200 Zettabytes by 2025, pressure on efficient data storage systems is intensifying. The shift from HDD to flash-based SSD provides one of the most fundamental shifts in storage technology, increasing performance capabilities significantly. However, flash storage comes with different characteristics than prior HDD storage technology. Therefore, storage software was unsuitable for leveraging the capabilities of flash storage. As a result, a plethora of storage applications have been design to better integrate with flash storage and align with flash characteristics. In this literature study we evaluate the effect the introduction of flash storage has had on the design of file systems, which providing one of the most essential mechanisms for managing persistent storage. We analyze the mechanisms for effectively managing flash storage, managing overheads of introduced design requirements, and leverage the capabilities of flash storage. Numerous methods have been adopted in file systems, however prominently revolve around similar design decisions, adhering to the flash hardware constrains, and limiting software intervention. Future design of storage software remains prominent with the constant growth in flash-based storage devices and interfaces, providing an increasing possibility to enhance flash integration in the host storage software stack
WLFC: Write Less in Flash-based Cache
Flash-based disk caches, for example Bcache and Flashcache, has gained
tremendous popularity in industry in the last decade because of its low energy
consumption, non-volatile nature and high I/O speed. But these cache systems
have a worse write performance than the read performance because of the
asymmetric I/O costs and the the internal GC mechanism. In addition to the
performance issues, since the NAND flash is a type of EEPROM device, the
lifespan is also limited by the Program/Erase (P/E) cycles. So how to improve
the performance and the lifespan of flash-based caches in write-intensive
scenarios has always been a hot issue. Benefiting from Open-Channel SSDs
(OCSSDs), we propose a write-friendly flash-based disk cache system, which is
called WLFC (Write Less in the Flash-based Cache). In WLFC, a strictly
sequential writing method is used to minimize the write amplification. A new
replacement algorithm for the write buffer is designed to minimize the erase
count caused by the evicting. And a new data layout strategy is designed to
minimize the metadata size persisted in SSDs. As a result, the Over-Provisioned
(OP) space is completely removed, the erase count of the flash is greatly
reduced, and the metadata size is 1/10 or less than that in BCache. Even with a
small amount of metadata, the data consistency after the crash is still
guaranteed. Compared with the existing mechanism, WLFC brings a 7%-80%
reduction in write latency, a 1.07*-4.5* increment in write throughput, and a
50%-88.9% reduction in erase count, with a moderate overhead in read
performance
A Survey on the Integration of NAND Flash Storage in the Design of File Systems and the Host Storage Software Stack
With the ever-increasing amount of data generate in the world, estimated to
reach over 200 Zettabytes by 2025, pressure on efficient data storage systems
is intensifying. The shift from HDD to flash-based SSD provides one of the most
fundamental shifts in storage technology, increasing performance capabilities
significantly. However, flash storage comes with different characteristics than
prior HDD storage technology. Therefore, storage software was unsuitable for
leveraging the capabilities of flash storage. As a result, a plethora of
storage applications have been design to better integrate with flash storage
and align with flash characteristics.
In this literature study we evaluate the effect the introduction of flash
storage has had on the design of file systems, which providing one of the most
essential mechanisms for managing persistent storage. We analyze the mechanisms
for effectively managing flash storage, managing overheads of introduced design
requirements, and leverage the capabilities of flash storage. Numerous methods
have been adopted in file systems, however prominently revolve around similar
design decisions, adhering to the flash hardware constrains, and limiting
software intervention. Future design of storage software remains prominent with
the constant growth in flash-based storage devices and interfaces, providing an
increasing possibility to enhance flash integration in the host storage
software stack
RapidSwap: ํจ์จ์ ์ธ ๊ณ์ธตํ Far Memory
ํ์๋
ผ๋ฌธ(์์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2021.8. Bernhard Egger.As computation responsibilities are transferred and migrated to cloud computing environments, cloud operators are facing more challenges to accommodate workloads provided by their customers. Modern applications typically require a massive amount of main memory. DRAM allows the robust delivery of data to processing entities in conventional node-centric architectures. However, physically expanding DRAM is impracticable due to hardware limits and cost. In this thesis, we present RapidSwap, an efficient hierarchical far memory that exploits phase-change memory (persistent memory) in data centers to present near-DRAM performance at a significantly lower total cost of ownership (TCO). RapidSwap migrates cold memory contents to slower and cheaper storage devices by exhibiting the memory access frequency of applications. Evaluated with several different real-world cloud benchmark scenarios, RapidSwap achieves a reduction of 20% in operating cost at minimal performance degradation and is 30% more cost-effective than pure DRAM solutions. RapidSwap exemplifies that sophisticated utilization of novel storage technologies can present significant TCO savings in cloud data centers.์ปดํจํ
ํ๊ฒฝ์ด ํด๋ผ์ฐ๋ ํ๊ฒฝ์ ์ค์ฌ์ผ๋ก ๋ณํํ๊ณ ์์ด ํด๋ผ์ฐ๋ ์ ๊ณต์๋ ๊ณ ๊ฐ์ด ์ ๊ณตํ๋ ์ํฌ๋ก๋๋ฅผ ์์ฉํ๊ธฐ ์ํ ๋ค์ํ ๋ฌธ์ ์ ์ง๋ฉดํ๊ณ ์๋ค. ์ค๋๋ ์์ฉ ํ๋ก๊ทธ๋จ์ ์ผ๋ฐ์ ์ผ๋ก ๋ง์ ์์ ๋ฉ์ธ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์๊ตฌํ๋ค. ๊ธฐ์กด ๋
ธ๋ ์ค์ฌ ์ํคํ
์ฒ์์ DRAM์ ์ฌ์ฉํ๋ฉด ๋น ๋ฅด๊ฒ ๋ฐ์ดํฐ๋ฅผ ์ ๊ณตํ ์ ์๋ค. ๊ทธ๋ฌ๋, ๋ฌผ๋ฆฌ์ ์ผ๋ก DRAM์ ์ผ์ ์์ค ์ด์ ํ์ฅํ๋ ๊ฒ์ ํ๋์จ์ด ์ ํ๊ณผ ๋น์ฉ์ผ๋ก ์ธํด ํ์ค์ ์ผ๋ก ๋ถ๊ฐ๋ฅํ๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ DRAM์ ๊ฐ๊น์ด ์ฑ๋ฅ์ ์ ๊ณตํ๋ฉด์๋ ์ด ์์ ๋น์ฉ์ ์๋นํ ๋ฎ์ถ๋ ํจ์จ์ far memory์ธ RapidSwap์ ์ ์ํ์๋ค. RapidSwap์ ๋ฐ์ดํฐ์ผํฐ ํ๊ฒฝ์์ ์๋ณํ ๋ฉ๋ชจ๋ฆฌ (phase-change memory; persistent memory)๋ฅผ ํ์ฉํ๋ฉฐ ์ดํ๋ฆฌ์ผ์ด์
์ ๋ฉ๋ชจ๋ฆฌ ์ ๊ทผ ๋น๋๋ฅผ ์ถ์ ํ์ฌ ์์ฃผ ์ ๊ทผ๋์ง ์๋ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ๋๋ฆฌ๊ณ ์ ๋ ดํ ์ ์ฅ์ฅ์น๋ก ์ด์กํ์ฌ ์ด๋ฅผ ๋ฌ์ฑํ๋ค. ์ฌ๋ฌ ์ ๋ช
ํ ํด๋ผ์ฐ๋ ๋ฒค์น๋งํฌ ์๋๋ฆฌ์ค๋ก ํ๊ฐํ ๊ฒฐ๊ณผ, RapidSwap์ ์์ DRAM ๋๋น ์ฝ 20%์ ์ด์ ๋น์ฉ์ ์ ๊ฐํ๋ฉฐ ์ฝ 30%์ ๋น์ฉ ํจ์จ์ฑ์ ์ง๋๋ค. RapidSwap์ ์๋ก์ด ์คํ ๋ฆฌ์ง ๊ธฐ์ ์ ์ ๊ตํ๊ฒ ํ์ฉํ๋ฉด ํด๋ผ์ฐ๋ ๋ฐ์ดํฐ ์ผํฐ ํ๊ฒฝ์์ ์ด์๋น์ฉ์ ์๋นํ ์ ๊ฐํ ์ ์๋ค๋ ์ฌ์ค์ ๋ณด์ธ๋ค.Chapter 1 Introduction 1
Chapter 2 Background 4
2.1 Tiered Storage 4
2.2 Trends in Storage Devices 5
2.3 Techniques Proposed to Lower Memory Pressure 5
2.3.1 Transparent Memory Compression 5
2.3.2 Far Memory 6
Chapter 3 Motivation 9
3.1 Limitations of Existing Techniques 9
3.2 Tiered Storage as a Promising Alternative 10
Chapter 4 RapidSwap Design and Implementation 12
4.1 RapidSwap Design 12
4.1.1 Storage Frontend 12
4.1.2 Storage Backend 15
4.2 RapidSwap Implementation 17
4.2.1 Swap Handler 17
4.2.2 Storage Frontend 18
4.2.3 Storage Backend 20
Chapter 5 Results 21
5.1 Experimental Setup 21
5.2 RapidSwap Performance 23
5.2.1 Degradation over DRAM 23
5.2.2 Tiered Storage Utilization 27
5.2.3 Hit/Miss Analysis 28
5.3 Cost of Storage Tier 29
5.4 Cost Effectiveness 30
Chapter 6 Conclusion and Future Work 32
6.1 Conclusion 32
6.2 Future Work 33
Bibliography 34
์์ฝ 39์
๋ธ๋ ํ๋์ ์ ์ฅ์ฅ์น์ ์ฑ๋ฅ ๋ฐ ์๋ช ํฅ์์ ์ํ ํ๋ก๊ทธ๋จ ์ปจํ ์คํธ ๊ธฐ๋ฐ ์ต์ ํ ๊ธฐ๋ฒ
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2019. 2. ๊น์งํ.์ปดํจํ
์์คํ
์ ์ฑ๋ฅ ํฅ์์ ์ํด, ๊ธฐ์กด์ ๋๋ฆฐ ํ๋๋์คํฌ(HDD)๋ฅผ ๋น ๋ฅธ ๋ธ๋
ํ๋์ ๋ฉ๋ชจ๋ฆฌ ๊ธฐ๋ฐ ์ ์ฅ์ฅ์น(SSD)๋ก ๋์ฒดํ๊ณ ์ ํ๋ ์ฐ๊ตฌ๊ฐ ์ต๊ทผ ํ๋ฐํ ์งํ
๋๊ณ ์๋ค. ๊ทธ๋ฌ๋ ์ง์์ ์ธ ๋ฐ๋์ฒด ๊ณต์ ์ค์ผ์ผ๋ง ๋ฐ ๋ฉํฐ ๋ ๋ฒจ๋ง ๊ธฐ์ ๋ก SSD
๊ฐ๊ฒฉ์ ๋๊ธ HDD ์์ค์ผ๋ก ๋ฎ์์ก์ง๋ง, ์ต๊ทผ์ ์ฒจ๋จ ๋๋ฐ์ด์ค ๊ธฐ์ ์ ๋ถ์์ฉ์ผ
๋ก NAND ํ๋์ ๋ฉ๋ชจ๋ฆฌ์ ์๋ช
์ด ์งง์์ง๋ ๊ฒ์ ๊ณ ์ฑ๋ฅ ์ปดํจํ
์์คํ
์์์
SSD์ ๊ด๋ฒ์ํ ์ฑํ์ ๋ง๋ ์ฃผ์ ์ฅ๋ฒฝ ์ค ํ๋์ด๋ค.
๋ณธ ๋
ผ๋ฌธ์์๋ ์ต๊ทผ์ ๊ณ ๋ฐ๋ ๋ธ๋ ํ๋์ ๋ฉ๋ชจ๋ฆฌ์ ์๋ช
๋ฐ ์ฑ๋ฅ ๋ฌธ์ ๋ฅผ
ํด๊ฒฐํ๊ธฐ ์ํ ์์คํ
๋ ๋ฒจ์ ๊ฐ์ ๊ธฐ์ ์ ์ ์ํ๋ค. ์ ์ ๋ ๊ธฐ๋ฒ์ ์์ฉ ํ๋ก
๊ทธ๋จ์ ์ฐ๊ธฐ ๋ฌธ๋งฅ์ ํ์ฉํ์ฌ ๊ธฐ์กด์๋ ์ป์ ์ ์์๋ ๋ฐ์ดํฐ ์๋ช
ํจํด ๋ฐ ์ค๋ณต
๋ฐ์ดํฐ ํจํด์ ๋ถ์ํ์๋ค. ์ด์ ๊ธฐ๋ฐํ์ฌ, ๋จ์ผ ๊ณ์ธต์ ๋จ์ํ ์ ๋ณด๋ง์ ํ์ฉํ
๋ ๊ธฐ์กด ๊ธฐ๋ฒ์ ํ๊ณ๋ฅผ ๊ทน๋ณตํจ์ผ๋ก์จ ํจ๊ณผ์ ์ผ๋ก NAND ํ๋์ ๋ฉ๋ชจ๋ฆฌ์ ์ฑ๋ฅ
๋ฐ ์๋ช
์ ํฅ์์ํค๋ ์ต์ ํ ๋ฐฉ๋ฒ๋ก ์ ์ ์ํ๋ค.
๋จผ์ , ์์ฉ ํ๋ก๊ทธ๋จ์ I/O ์์
์๋ ๋ฌธ๋งฅ์ ๋ฐ๋ผ ๊ณ ์ ํ ๋ฐ์ดํฐ ์๋ช
๊ณผ ์ค
๋ณต ๋ฐ์ดํฐ์ ํจํด์ด ์กด์ฌํ๋ค๋ ์ ์ ๋ถ์์ ํตํด ํ์ธํ์๋ค. ๋ฌธ๋งฅ ์ ๋ณด๋ฅผ ํจ๊ณผ
์ ์ผ๋ก ํ์ฉํ๊ธฐ ์ํด ํ๋ก๊ทธ๋จ ์ปจํ
์คํธ (์ฐ๊ธฐ ๋ฌธ๋งฅ) ์ถ์ถ ๋ฐฉ๋ฒ์ ๊ตฌํ ํ์๋ค.
ํ๋ก๊ทธ๋จ ์ปจํ
์คํธ ์ ๋ณด๋ฅผ ํตํด ๊ฐ๋น์ง ์ปฌ๋ ์
๋ถํ์ ์ ํ๋ ์๋ช
์ NAND ํ
๋์ ๋ฉ๋ชจ๋ฆฌ ๊ฐ์ ์ ์ํ ๊ธฐ์กด ๊ธฐ์ ์ ํ๊ณ๋ฅผ ํจ๊ณผ์ ์ผ๋ก ๊ทน๋ณตํ ์ ์๋ค.
๋์งธ, ๋ฉํฐ ์คํธ๋ฆผ SSD์์ WAF๋ฅผ ์ค์ด๊ธฐ ์ํด ๋ฐ์ดํฐ ์๋ช
์์ธก์ ์ ํ
์ฑ์ ๋์ด๋ ๊ธฐ๋ฒ์ ์ ์ํ์๋ค. ์ด๋ฅผ ์ํด ์ ํ๋ฆฌ์ผ์ด์
์ I/O ์ปจํ
์คํธ๋ฅผ ํ์ฉ
ํ๋ ์์คํ
์์ค์ ์ ๊ทผ ๋ฐฉ์์ ์ ์ํ์๋ค. ์ ์๋ ๊ธฐ๋ฒ์ ํต์ฌ ๋๊ธฐ๋ ๋ฐ์ดํฐ
์๋ช
์ด LBA๋ณด๋ค ๋์ ์ถ์ํ ์์ค์์ ํ๊ฐ ๋์ด์ผ ํ๋ค๋ ๊ฒ์ด๋ค. ๋ฐ๋ผ์ ํ
๋ก๊ทธ๋จ ์ปจํ
์คํธ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ๋ฐ์ดํฐ์ ์๋ช
์ ๋ณด๋ค ์ ํํ ์์ธกํจ์ผ๋ก์จ, ๊ธฐ์กด
๊ธฐ๋ฒ์์ LBA๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ๋ฐ์ดํฐ ์๋ช
์ ๊ด๋ฆฌํ๋ ํ๊ณ๋ฅผ ๊ทน๋ณตํ๋ค. ๊ฒฐ๋ก ์ ์ผ
๋ก ๋ฐ๋ผ์ ๊ฐ๋น์ง ์ปฌ๋ ์
์ ํจ์จ์ ๋์ด๊ธฐ ์ํด ์๋ช
์ด ์งง์ ๋ฐ์ดํฐ๋ฅผ ์๋ช
์ด ๊ธด
๋ฐ์ดํฐ์ ํจ๊ณผ์ ์ผ๋ก ๋ถ๋ฆฌ ํ ์ ์๋ค.
๋ง์ง๋ง์ผ๋ก, ์ฐ๊ธฐ ํ๋ก๊ทธ๋จ ์ปจํ
์คํธ์ ์ค๋ณต ๋ฐ์ดํฐ ํจํด ๋ถ์์ ๊ธฐ๋ฐ์ผ๋ก
๋ถํ์ํ ์ค๋ณต ์ ๊ฑฐ ์์
์ ํผํ ์์๋ ์ ํ์ ์ค๋ณต ์ ๊ฑฐ๋ฅผ ์ ์ํ๋ค. ์ค๋ณต ๋ฐ
์ดํฐ๋ฅผ ์์ฑํ์ง ์๋ ํ๋ก๊ทธ๋จ ์ปจํ
์คํธ๊ฐ ์กด์ฌํจ์ ๋ถ์์ ์ผ๋ก ๋ณด์ด๊ณ ์ด๋ค์
์ ์ธํจ์ผ๋ก์จ, ์ค๋ณต์ ๊ฑฐ ๋์์ ํจ์จ์ฑ์ ๋์ผ ์ ์๋ค. ๋ํ ์ค๋ณต ๋ฐ์ดํฐ๊ฐ ๋ฐ์
ํ๋ ํจํด์ ๊ธฐ๋ฐํ์ฌ ๊ธฐ๋ก๋ ๋ฐ์ดํฐ๋ฅผ ๊ด๋ฆฌํ๋ ์๋ฃ๊ตฌ์กฐ ์ ์ง ์ ์ฑ
์ ์๋กญ๊ฒ
์ ์ํ์๋ค. ์ถ๊ฐ์ ์ผ๋ก, ์๋ธ ํ์ด์ง ์ฒญํฌ๋ฅผ ๋์
ํ์ฌ ์ค๋ณต ๋ฐ์ดํฐ๋ฅผ ์ ๊ฑฐ ํ
๊ฐ๋ฅ์ฑ์ ๋์ด๋ ์ธ๋ถํ ๋ ์ค๋ณต ์ ๊ฑฐ๋ฅผ ์ ์ํ๋ค.
์ ์ ๋ ๊ธฐ์ ์ ํจ๊ณผ๋ฅผ ํ๊ฐํ๊ธฐ ์ํด ๋ค์ํ ์ค์ ์์คํ
์์ ์์ง ๋ I/O
ํธ๋ ์ด์ค์ ๊ธฐ๋ฐํ ์๋ฎฌ๋ ์ด์
ํ๊ฐ ๋ฟ๋ง ์๋๋ผ ์๋ฎฌ๋ ์ดํฐ ๊ตฌํ์ ํตํด ์ค์
์์ฉ์ ๋์ํ๋ฉด์ ์ผ๋ จ์ ํ๊ฐ๋ฅผ ์ํํ๋ค. ๋ ๋์๊ฐ ๋ฉํฐ ์คํธ๋ฆผ ๋๋ฐ์ด์ค์
๋ด๋ถ ํ์จ์ด๋ฅผ ์์ ํ์ฌ ์ค์ ์ ๊ฐ์ฅ ๋น์ทํ๊ฒ ์ค์ ๋ ํ๊ฒฝ์์ ์คํ์ ์ํํ
์๋ค. ์คํ ๊ฒฐ๊ณผ๋ฅผ ํตํด ์ ์๋ ์์คํ
์์ค ์ต์ ํ ๊ธฐ๋ฒ์ด ์ฑ๋ฅ ๋ฐ ์๋ช
๊ฐ์
์ธก๋ฉด์์ ๊ธฐ์กด ์ต์ ํ ๊ธฐ๋ฒ๋ณด๋ค ๋ ํจ๊ณผ์ ์ด์์์ ํ์ธํ์๋ค. ํฅํ ์ ์๋ ๊ธฐ
๋ฒ๋ค์ด ๋ณด๋ค ๋ ๋ฐ์ ๋๋ค๋ฉด, ๋ธ๋ ํ๋์ ๋ฉ๋ชจ๋ฆฌ๊ฐ ์ด๊ณ ์ ์ปดํจํ
์์คํ
์ ์ฃผ
์ ์ฅ์ฅ์น๋ก ๋๋ฆฌ ์ฌ์ฉ๋๋ ๋ฐ์ ๊ธ์ ์ ์ธ ๊ธฐ์ฌ๋ฅผ ํ ์ ์์ ๊ฒ์ผ๋ก ๊ธฐ๋๋๋ค.Replacing HDDs with NAND flash-based storage devices (SSDs) has been
one of the major challenges in modern computing systems especially in regards to better performance and higher mobility. Although the continuous
semiconductor process scaling and multi-leveling techniques lower the price
of SSDs to the comparable level of HDDs, the decreasing lifetime of NAND
flash memory, as a side effect of recent advanced device technologies, is
emerging as one of the major barriers to the wide adoption of SSDs in highperformance computing systems.
In this dissertation, system-level lifetime improvement techniques for
recent high-density NAND flash memory are proposed. Unlike existing techniques, the proposed techniques resolve the problems of decreasing performance and lifetime of NAND flash memory by exploiting the I/O context
of an application to analyze data lifetime patterns or duplicate data contents
patterns.
We first present that I/O activities of an application have distinct data
lifetime and duplicate data patterns. In order to effectively utilize the context information, we implemented the program context extraction method.
With the program context, we can overcome the limitations of existing techniques for improving the garbage collection overhead and limited lifetime
of NAND flash memory.
Second, we propose a system-level approach to reduce WAF that exploits the I/O context of an application to increase the data lifetime prediction for the multi-streamed SSDs. The key motivation behind the proposed
technique was that data lifetimes should be estimated at a higher abstraction
level than LBAs, so we employ a write program context as a stream management unit. Thus, it can effectively separate data with short lifetimes from
data with long lifetimes to improve the efficiency of garbage collection.
Lastly, we propose a selective deduplication that can avoid unnecessary deduplication work based on the duplicate data pattern analysis of write
program context. With the help of selective deduplication, we also propose
fine-grained deduplication which improves the likelihood of eliminating redundant data by introducing sub-page chunk. It also resolves technical difficulties caused by its finer granularity, i.e., increased memory requirement
and read response time.
In order to evaluate the effectiveness of the proposed techniques, we
performed a series of evaluations using both a trace-driven simulator and
emulator with I/O traces which were collected from various real-world systems. To understand the feasibility of the proposed techniques, we also implemented them in Linux kernel on top of our in-house flash storage prototype and then evaluated their effects on the lifetime while running real-world
applications. Our experimental results show that system-level optimization
techniques are more effective over existing optimization techniques.I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Garbage Collection Problem . . . . . . . . . . . . . 2
1.1.2 Limited Endurance Problem . . . . . . . . . . . . . 4
1.2 Dissertation Goals . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . 7
II. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 NAND Flash Memory System Software . . . . . . . . . . . 9
2.2 NAND Flash-Based Storage Devices . . . . . . . . . . . . . 10
2.3 Multi-stream Interface . . . . . . . . . . . . . . . . . . . . 11
2.4 Inline Data Deduplication Technique . . . . . . . . . . . . . 12
2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.1 Data Separation Techniques for Multi-streamed SSDs 13
2.5.2 Write Traffic Reduction Techniques . . . . . . . . . 15
2.5.3 Program Context based Optimization Techniques for Operating Systems . . . . . . . . 18
III. Program Context-based Analysis . . . . . . . . . . . . . . . . 21
3.1 Definition and Extraction of Program Context . . . . . . . . 21
3.2 Data Lifetime Patterns of I/O Activities . . . . . . . . . . . 24
3.3 Duplicate Data Patterns of I/O Activities . . . . . . . . . . . 26
IV. Fully Automatic Stream Management For Multi-Streamed SSDs Using Program Contexts . . 29
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 No Automatic Stream Management for General I/O Workloads . . . . . . . . . 33
4.2.2 Limited Number of Supported Streams . . . . . . . 36
4.3 Automatic I/O Activity Management . . . . . . . . . . . . . 38
4.3.1 PC as a Unit of Lifetime Classification for General I/O Workloads . . . . . . . . . . . 39
4.4 Support for Large Number of Streams . . . . . . . . . . . . 41
4.4.1 PCs with Large Lifetime Variances . . . . . . . . . 42
4.4.2 Implementation of Internal Streams . . . . . . . . . 44
4.5 Design and Implementation of PCStream . . . . . . . . . . 46
4.5.1 PC Lifetime Management . . . . . . . . . . . . . . 46
4.5.2 Mapping PCs to SSD streams . . . . . . . . . . . . 49
4.5.3 Internal Stream Management . . . . . . . . . . . . . 50
4.5.4 PC Extraction for Indirect Writes . . . . . . . . . . 51
4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . 53
4.6.1 Experimental Settings . . . . . . . . . . . . . . . . 53
4.6.2 Performance Evaluation . . . . . . . . . . . . . . . 55
4.6.3 WAF Comparison . . . . . . . . . . . . . . . . . . . 56
4.6.4 Per-stream Lifetime Distribution Analysis . . . . . . 57
4.6.5 Impact of Internal Streams . . . . . . . . . . . . . . 58
4.6.6 Impact of the PC Attribute Table . . . . . . . . . . . 60
V. Deduplication Technique using Program Contexts . . . . . . 62
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Selective Deduplication using Program Contexts . . . . . . . 63
5.2.1 PCDedup: Improving SSD Deduplication Efficiency using Selective Hash Cache Management . . . . . . 63
5.2.2 2-level LRU Eviction Policy . . . . . . . . . . . . . 68
5.3 Exploiting Small Chunk Size . . . . . . . . . . . . . . . . . 70
5.3.1 Fine-Grained Deduplication . . . . . . . . . . . . . 70
5.3.2 Read Overhead Management . . . . . . . . . . . . . 76
5.3.3 Memory Overhead Management . . . . . . . . . . . 80
5.3.4 Experimental Results . . . . . . . . . . . . . . . . . 82
VI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.1 Summary and Conclusions . . . . . . . . . . . . . . . . . . 88
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.1 Supporting applications that have unusal program contexts . . . . . . . . . . . . . 89
6.2.2 Optimizing read request based on the I/O context . . 90
6.2.3 Exploiting context information to improve fingerprint lookups . . . . .. . . . . . 91
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92Docto
The Design and Implementation of a High-Performance Log-Structured RAID System for ZNS SSDs
Zoned Namespace (ZNS) defines a new abstraction for host software to flexibly
manage storage in flash-based SSDs as append-only zones. It also provides a
Zone Append primitive to further boost the write performance of ZNS SSDs by
exploiting intra-zone parallelism. However, making Zone Append effective for
reliable and scalable storage, in the form of a RAID array of multiple ZNS
SSDs, is non-trivial since Zone Append offloads address management to ZNS SSDs
and requires hosts to dedicatedly manage RAID stripes across multiple drives.
We propose ZapRAID, a high-performance log-structured RAID system for ZNS SSDs
by carefully exploiting Zone Append to achieve high write parallelism and
lightweight stripe management. ZapRAID adopts a group-based data layout with a
coarse-grained ordering across multiple groups of stripes, such that it can use
small-size metadata for stripe management on a per-group basis under Zone
Append. It further adopts hybrid data management to simultaneously achieve
intra-zone and inter-zone parallelism through a careful combination of both
Zone Append and Zone Write primitives. We evaluate ZapRAID using
microbenchmarks, trace-driven experiments, and real-application experiments.
Our evaluation results show that ZapRAID achieves high write throughput and
maintains high performance in normal reads, degraded reads, crash recovery, and
full-drive recovery.Comment: 29 page
Memory Subsystems for Security, Consistency, and Scalability
In response to the continuous demand for the ability to process ever larger datasets, as well as discoveries in next-generation memory technologies, researchers have been vigorously studying memory-driven computing architectures that shall allow data-intensive applications to access enormous amounts of pooled non-volatile memory. As applications continue to interact with increasing amounts of components and datasets, existing systems struggle to eรฟciently enforce the principle of least privilege for security. While non-volatile memory can retain data even after a power loss and allow for large main memory capacity, programmers have to bear the burdens of maintaining the consistency of program memory for fault tolerance as well as handling huge datasets with traditional yet expensive memory management interfaces for scalability. Todayโs computer systems have become too sophisticated for existing memory subsystems to handle many design requirements. In this dissertation, we introduce three memory subsystems to address challenges in terms of security, consistency, and scalability. Specifcally, we propose SMVs to provide threads with fne-grained control over access privileges for a partially shared address space for security, NVthreads to allow programmers to easily leverage nonvolatile memory with automatic persistence for consistency, and PetaMem to enable memory-centric applications to freely access memory beyond the traditional process boundary with support for memory isolation and crash recovery for security, consistency, and scalability
- โฆ