261 research outputs found
๋ธ๋ ํ๋์ ์ ์ฅ์ฅ์น์ ์ฑ๋ฅ ๋ฐ ์๋ช ํฅ์์ ์ํ ํ๋ก๊ทธ๋จ ์ปจํ ์คํธ ๊ธฐ๋ฐ ์ต์ ํ ๊ธฐ๋ฒ
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2019. 2. ๊น์งํ.์ปดํจํ
์์คํ
์ ์ฑ๋ฅ ํฅ์์ ์ํด, ๊ธฐ์กด์ ๋๋ฆฐ ํ๋๋์คํฌ(HDD)๋ฅผ ๋น ๋ฅธ ๋ธ๋
ํ๋์ ๋ฉ๋ชจ๋ฆฌ ๊ธฐ๋ฐ ์ ์ฅ์ฅ์น(SSD)๋ก ๋์ฒดํ๊ณ ์ ํ๋ ์ฐ๊ตฌ๊ฐ ์ต๊ทผ ํ๋ฐํ ์งํ
๋๊ณ ์๋ค. ๊ทธ๋ฌ๋ ์ง์์ ์ธ ๋ฐ๋์ฒด ๊ณต์ ์ค์ผ์ผ๋ง ๋ฐ ๋ฉํฐ ๋ ๋ฒจ๋ง ๊ธฐ์ ๋ก SSD
๊ฐ๊ฒฉ์ ๋๊ธ HDD ์์ค์ผ๋ก ๋ฎ์์ก์ง๋ง, ์ต๊ทผ์ ์ฒจ๋จ ๋๋ฐ์ด์ค ๊ธฐ์ ์ ๋ถ์์ฉ์ผ
๋ก NAND ํ๋์ ๋ฉ๋ชจ๋ฆฌ์ ์๋ช
์ด ์งง์์ง๋ ๊ฒ์ ๊ณ ์ฑ๋ฅ ์ปดํจํ
์์คํ
์์์
SSD์ ๊ด๋ฒ์ํ ์ฑํ์ ๋ง๋ ์ฃผ์ ์ฅ๋ฒฝ ์ค ํ๋์ด๋ค.
๋ณธ ๋
ผ๋ฌธ์์๋ ์ต๊ทผ์ ๊ณ ๋ฐ๋ ๋ธ๋ ํ๋์ ๋ฉ๋ชจ๋ฆฌ์ ์๋ช
๋ฐ ์ฑ๋ฅ ๋ฌธ์ ๋ฅผ
ํด๊ฒฐํ๊ธฐ ์ํ ์์คํ
๋ ๋ฒจ์ ๊ฐ์ ๊ธฐ์ ์ ์ ์ํ๋ค. ์ ์ ๋ ๊ธฐ๋ฒ์ ์์ฉ ํ๋ก
๊ทธ๋จ์ ์ฐ๊ธฐ ๋ฌธ๋งฅ์ ํ์ฉํ์ฌ ๊ธฐ์กด์๋ ์ป์ ์ ์์๋ ๋ฐ์ดํฐ ์๋ช
ํจํด ๋ฐ ์ค๋ณต
๋ฐ์ดํฐ ํจํด์ ๋ถ์ํ์๋ค. ์ด์ ๊ธฐ๋ฐํ์ฌ, ๋จ์ผ ๊ณ์ธต์ ๋จ์ํ ์ ๋ณด๋ง์ ํ์ฉํ
๋ ๊ธฐ์กด ๊ธฐ๋ฒ์ ํ๊ณ๋ฅผ ๊ทน๋ณตํจ์ผ๋ก์จ ํจ๊ณผ์ ์ผ๋ก NAND ํ๋์ ๋ฉ๋ชจ๋ฆฌ์ ์ฑ๋ฅ
๋ฐ ์๋ช
์ ํฅ์์ํค๋ ์ต์ ํ ๋ฐฉ๋ฒ๋ก ์ ์ ์ํ๋ค.
๋จผ์ , ์์ฉ ํ๋ก๊ทธ๋จ์ I/O ์์
์๋ ๋ฌธ๋งฅ์ ๋ฐ๋ผ ๊ณ ์ ํ ๋ฐ์ดํฐ ์๋ช
๊ณผ ์ค
๋ณต ๋ฐ์ดํฐ์ ํจํด์ด ์กด์ฌํ๋ค๋ ์ ์ ๋ถ์์ ํตํด ํ์ธํ์๋ค. ๋ฌธ๋งฅ ์ ๋ณด๋ฅผ ํจ๊ณผ
์ ์ผ๋ก ํ์ฉํ๊ธฐ ์ํด ํ๋ก๊ทธ๋จ ์ปจํ
์คํธ (์ฐ๊ธฐ ๋ฌธ๋งฅ) ์ถ์ถ ๋ฐฉ๋ฒ์ ๊ตฌํ ํ์๋ค.
ํ๋ก๊ทธ๋จ ์ปจํ
์คํธ ์ ๋ณด๋ฅผ ํตํด ๊ฐ๋น์ง ์ปฌ๋ ์
๋ถํ์ ์ ํ๋ ์๋ช
์ NAND ํ
๋์ ๋ฉ๋ชจ๋ฆฌ ๊ฐ์ ์ ์ํ ๊ธฐ์กด ๊ธฐ์ ์ ํ๊ณ๋ฅผ ํจ๊ณผ์ ์ผ๋ก ๊ทน๋ณตํ ์ ์๋ค.
๋์งธ, ๋ฉํฐ ์คํธ๋ฆผ SSD์์ WAF๋ฅผ ์ค์ด๊ธฐ ์ํด ๋ฐ์ดํฐ ์๋ช
์์ธก์ ์ ํ
์ฑ์ ๋์ด๋ ๊ธฐ๋ฒ์ ์ ์ํ์๋ค. ์ด๋ฅผ ์ํด ์ ํ๋ฆฌ์ผ์ด์
์ I/O ์ปจํ
์คํธ๋ฅผ ํ์ฉ
ํ๋ ์์คํ
์์ค์ ์ ๊ทผ ๋ฐฉ์์ ์ ์ํ์๋ค. ์ ์๋ ๊ธฐ๋ฒ์ ํต์ฌ ๋๊ธฐ๋ ๋ฐ์ดํฐ
์๋ช
์ด LBA๋ณด๋ค ๋์ ์ถ์ํ ์์ค์์ ํ๊ฐ ๋์ด์ผ ํ๋ค๋ ๊ฒ์ด๋ค. ๋ฐ๋ผ์ ํ
๋ก๊ทธ๋จ ์ปจํ
์คํธ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ๋ฐ์ดํฐ์ ์๋ช
์ ๋ณด๋ค ์ ํํ ์์ธกํจ์ผ๋ก์จ, ๊ธฐ์กด
๊ธฐ๋ฒ์์ LBA๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ๋ฐ์ดํฐ ์๋ช
์ ๊ด๋ฆฌํ๋ ํ๊ณ๋ฅผ ๊ทน๋ณตํ๋ค. ๊ฒฐ๋ก ์ ์ผ
๋ก ๋ฐ๋ผ์ ๊ฐ๋น์ง ์ปฌ๋ ์
์ ํจ์จ์ ๋์ด๊ธฐ ์ํด ์๋ช
์ด ์งง์ ๋ฐ์ดํฐ๋ฅผ ์๋ช
์ด ๊ธด
๋ฐ์ดํฐ์ ํจ๊ณผ์ ์ผ๋ก ๋ถ๋ฆฌ ํ ์ ์๋ค.
๋ง์ง๋ง์ผ๋ก, ์ฐ๊ธฐ ํ๋ก๊ทธ๋จ ์ปจํ
์คํธ์ ์ค๋ณต ๋ฐ์ดํฐ ํจํด ๋ถ์์ ๊ธฐ๋ฐ์ผ๋ก
๋ถํ์ํ ์ค๋ณต ์ ๊ฑฐ ์์
์ ํผํ ์์๋ ์ ํ์ ์ค๋ณต ์ ๊ฑฐ๋ฅผ ์ ์ํ๋ค. ์ค๋ณต ๋ฐ
์ดํฐ๋ฅผ ์์ฑํ์ง ์๋ ํ๋ก๊ทธ๋จ ์ปจํ
์คํธ๊ฐ ์กด์ฌํจ์ ๋ถ์์ ์ผ๋ก ๋ณด์ด๊ณ ์ด๋ค์
์ ์ธํจ์ผ๋ก์จ, ์ค๋ณต์ ๊ฑฐ ๋์์ ํจ์จ์ฑ์ ๋์ผ ์ ์๋ค. ๋ํ ์ค๋ณต ๋ฐ์ดํฐ๊ฐ ๋ฐ์
ํ๋ ํจํด์ ๊ธฐ๋ฐํ์ฌ ๊ธฐ๋ก๋ ๋ฐ์ดํฐ๋ฅผ ๊ด๋ฆฌํ๋ ์๋ฃ๊ตฌ์กฐ ์ ์ง ์ ์ฑ
์ ์๋กญ๊ฒ
์ ์ํ์๋ค. ์ถ๊ฐ์ ์ผ๋ก, ์๋ธ ํ์ด์ง ์ฒญํฌ๋ฅผ ๋์
ํ์ฌ ์ค๋ณต ๋ฐ์ดํฐ๋ฅผ ์ ๊ฑฐ ํ
๊ฐ๋ฅ์ฑ์ ๋์ด๋ ์ธ๋ถํ ๋ ์ค๋ณต ์ ๊ฑฐ๋ฅผ ์ ์ํ๋ค.
์ ์ ๋ ๊ธฐ์ ์ ํจ๊ณผ๋ฅผ ํ๊ฐํ๊ธฐ ์ํด ๋ค์ํ ์ค์ ์์คํ
์์ ์์ง ๋ I/O
ํธ๋ ์ด์ค์ ๊ธฐ๋ฐํ ์๋ฎฌ๋ ์ด์
ํ๊ฐ ๋ฟ๋ง ์๋๋ผ ์๋ฎฌ๋ ์ดํฐ ๊ตฌํ์ ํตํด ์ค์
์์ฉ์ ๋์ํ๋ฉด์ ์ผ๋ จ์ ํ๊ฐ๋ฅผ ์ํํ๋ค. ๋ ๋์๊ฐ ๋ฉํฐ ์คํธ๋ฆผ ๋๋ฐ์ด์ค์
๋ด๋ถ ํ์จ์ด๋ฅผ ์์ ํ์ฌ ์ค์ ์ ๊ฐ์ฅ ๋น์ทํ๊ฒ ์ค์ ๋ ํ๊ฒฝ์์ ์คํ์ ์ํํ
์๋ค. ์คํ ๊ฒฐ๊ณผ๋ฅผ ํตํด ์ ์๋ ์์คํ
์์ค ์ต์ ํ ๊ธฐ๋ฒ์ด ์ฑ๋ฅ ๋ฐ ์๋ช
๊ฐ์
์ธก๋ฉด์์ ๊ธฐ์กด ์ต์ ํ ๊ธฐ๋ฒ๋ณด๋ค ๋ ํจ๊ณผ์ ์ด์์์ ํ์ธํ์๋ค. ํฅํ ์ ์๋ ๊ธฐ
๋ฒ๋ค์ด ๋ณด๋ค ๋ ๋ฐ์ ๋๋ค๋ฉด, ๋ธ๋ ํ๋์ ๋ฉ๋ชจ๋ฆฌ๊ฐ ์ด๊ณ ์ ์ปดํจํ
์์คํ
์ ์ฃผ
์ ์ฅ์ฅ์น๋ก ๋๋ฆฌ ์ฌ์ฉ๋๋ ๋ฐ์ ๊ธ์ ์ ์ธ ๊ธฐ์ฌ๋ฅผ ํ ์ ์์ ๊ฒ์ผ๋ก ๊ธฐ๋๋๋ค.Replacing HDDs with NAND flash-based storage devices (SSDs) has been
one of the major challenges in modern computing systems especially in regards to better performance and higher mobility. Although the continuous
semiconductor process scaling and multi-leveling techniques lower the price
of SSDs to the comparable level of HDDs, the decreasing lifetime of NAND
flash memory, as a side effect of recent advanced device technologies, is
emerging as one of the major barriers to the wide adoption of SSDs in highperformance computing systems.
In this dissertation, system-level lifetime improvement techniques for
recent high-density NAND flash memory are proposed. Unlike existing techniques, the proposed techniques resolve the problems of decreasing performance and lifetime of NAND flash memory by exploiting the I/O context
of an application to analyze data lifetime patterns or duplicate data contents
patterns.
We first present that I/O activities of an application have distinct data
lifetime and duplicate data patterns. In order to effectively utilize the context information, we implemented the program context extraction method.
With the program context, we can overcome the limitations of existing techniques for improving the garbage collection overhead and limited lifetime
of NAND flash memory.
Second, we propose a system-level approach to reduce WAF that exploits the I/O context of an application to increase the data lifetime prediction for the multi-streamed SSDs. The key motivation behind the proposed
technique was that data lifetimes should be estimated at a higher abstraction
level than LBAs, so we employ a write program context as a stream management unit. Thus, it can effectively separate data with short lifetimes from
data with long lifetimes to improve the efficiency of garbage collection.
Lastly, we propose a selective deduplication that can avoid unnecessary deduplication work based on the duplicate data pattern analysis of write
program context. With the help of selective deduplication, we also propose
fine-grained deduplication which improves the likelihood of eliminating redundant data by introducing sub-page chunk. It also resolves technical difficulties caused by its finer granularity, i.e., increased memory requirement
and read response time.
In order to evaluate the effectiveness of the proposed techniques, we
performed a series of evaluations using both a trace-driven simulator and
emulator with I/O traces which were collected from various real-world systems. To understand the feasibility of the proposed techniques, we also implemented them in Linux kernel on top of our in-house flash storage prototype and then evaluated their effects on the lifetime while running real-world
applications. Our experimental results show that system-level optimization
techniques are more effective over existing optimization techniques.I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Garbage Collection Problem . . . . . . . . . . . . . 2
1.1.2 Limited Endurance Problem . . . . . . . . . . . . . 4
1.2 Dissertation Goals . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . 7
II. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 NAND Flash Memory System Software . . . . . . . . . . . 9
2.2 NAND Flash-Based Storage Devices . . . . . . . . . . . . . 10
2.3 Multi-stream Interface . . . . . . . . . . . . . . . . . . . . 11
2.4 Inline Data Deduplication Technique . . . . . . . . . . . . . 12
2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.1 Data Separation Techniques for Multi-streamed SSDs 13
2.5.2 Write Traffic Reduction Techniques . . . . . . . . . 15
2.5.3 Program Context based Optimization Techniques for Operating Systems . . . . . . . . 18
III. Program Context-based Analysis . . . . . . . . . . . . . . . . 21
3.1 Definition and Extraction of Program Context . . . . . . . . 21
3.2 Data Lifetime Patterns of I/O Activities . . . . . . . . . . . 24
3.3 Duplicate Data Patterns of I/O Activities . . . . . . . . . . . 26
IV. Fully Automatic Stream Management For Multi-Streamed SSDs Using Program Contexts . . 29
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 No Automatic Stream Management for General I/O Workloads . . . . . . . . . 33
4.2.2 Limited Number of Supported Streams . . . . . . . 36
4.3 Automatic I/O Activity Management . . . . . . . . . . . . . 38
4.3.1 PC as a Unit of Lifetime Classification for General I/O Workloads . . . . . . . . . . . 39
4.4 Support for Large Number of Streams . . . . . . . . . . . . 41
4.4.1 PCs with Large Lifetime Variances . . . . . . . . . 42
4.4.2 Implementation of Internal Streams . . . . . . . . . 44
4.5 Design and Implementation of PCStream . . . . . . . . . . 46
4.5.1 PC Lifetime Management . . . . . . . . . . . . . . 46
4.5.2 Mapping PCs to SSD streams . . . . . . . . . . . . 49
4.5.3 Internal Stream Management . . . . . . . . . . . . . 50
4.5.4 PC Extraction for Indirect Writes . . . . . . . . . . 51
4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . 53
4.6.1 Experimental Settings . . . . . . . . . . . . . . . . 53
4.6.2 Performance Evaluation . . . . . . . . . . . . . . . 55
4.6.3 WAF Comparison . . . . . . . . . . . . . . . . . . . 56
4.6.4 Per-stream Lifetime Distribution Analysis . . . . . . 57
4.6.5 Impact of Internal Streams . . . . . . . . . . . . . . 58
4.6.6 Impact of the PC Attribute Table . . . . . . . . . . . 60
V. Deduplication Technique using Program Contexts . . . . . . 62
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Selective Deduplication using Program Contexts . . . . . . . 63
5.2.1 PCDedup: Improving SSD Deduplication Efficiency using Selective Hash Cache Management . . . . . . 63
5.2.2 2-level LRU Eviction Policy . . . . . . . . . . . . . 68
5.3 Exploiting Small Chunk Size . . . . . . . . . . . . . . . . . 70
5.3.1 Fine-Grained Deduplication . . . . . . . . . . . . . 70
5.3.2 Read Overhead Management . . . . . . . . . . . . . 76
5.3.3 Memory Overhead Management . . . . . . . . . . . 80
5.3.4 Experimental Results . . . . . . . . . . . . . . . . . 82
VI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.1 Summary and Conclusions . . . . . . . . . . . . . . . . . . 88
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.1 Supporting applications that have unusal program contexts . . . . . . . . . . . . . 89
6.2.2 Optimizing read request based on the I/O context . . 90
6.2.3 Exploiting context information to improve fingerprint lookups . . . . .. . . . . . 91
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92Docto
HIODS: hybrid inline and offline deduplication system
Dissertaรงรฃo de mestrado integrado em Engenharia InformรกticaDeduplication is a technique that allows finding and removing duplicate data at storage
systems. With the current exponential growth of digital information, this mechanism is
becoming more and more desirable for reducing the infrastructural costs of persisting such
data. Therefore, deduplication is now being widely applied to several storage appliances
serving applications with different requirements (e.g., archival, backup, primary storage).
However, deduplication requires additional processing logic for each storage request in
order to detect and eliminate duplicate content. Traditionally, this processing is done in
the I/O critical path (inline), thus introducing a performance penalty on the throughput
and latency of requests being served by the storage appliance. An alternative solution is to
do this process as a background task, thus outside of the I/O critical path (offline), at the
cost of requiring additional storage space as duplicate content is not found and eliminated
immediately. However, the choice of what type of strategy to use is typically done manually
and does not take into consideration changes in the applications' workloads.
This dissertation proposes HIODS, a hybrid deduplication solution capable of automati cally changing between inline and offline deduplication according to the requirements (e.g.,
desired storage I/O throughput goal) of applications and their dynamic workloads. The
goal is to choose the best strategy that fulfills the targeted I/O performance objectives while
optimizing deduplication space savings.
Finally, a prototype of HIODS is implemented and evaluated extensively with different
storage workloads. Results show that HIODS is able to change its deduplication mode dy namically, according to the storage workload being served, while balancing I/O performance
and space savings requirements efficiently.A deduplicaรงรฃo รฉ uma tรฉcnica que permite encontrar e remover dados duplicados guardados nos sistemas de armazenamento. Com o crescimento exponencial da informaรงรฃo digital que vivemos atualmente, este mecanismo estรก a tornar-se cada vez mais popular para reduzir os custos das infraestruturas onde esses dados se encontram alojados. De facto, a deduplicaรงรฃo รฉ, hoje em dia, usada numa grande variedade de serviรงos de armazenamento que servem diferentes aplicaรงรตes com requisitos particulares (ex.: arquivo, backup, armazenamento primรกrio).
No entanto, a deduplicaรงรฃo adiciona uma camada de processamento extra a cada pedido de armazenamento, de modo a conseguir detetar e eliminar o conteรบdo redundante. Tradicionalmente, este processo รฉ realizado durante o caminho crรญtico do I/O (inline), causando
perdas de desempenho e aumentos na latรชncia dos pedidos processados. Uma alternativa รฉ alterar o processamento para segundo plano, aliviando assim os custos no caminho crรญtico do I/O (offline). Esta soluรงรฃo requer espaรงo de armazenamento adicional, visto que os duplicados nรฃo sรฃo encontrados nem eliminados imediatamente. No entanto, a estratรฉgia a seguir รฉ escolhida de forma manual, nรฃo tendo em consideraรงรฃo qualquer possรญvel mudanรงa na carga de trabalho das aplicaรงรตes.
Esta dissertaรงรฃo propรตe assim o HIODS, um sistema de deduplicaรงรฃo hรญbrido capaz de alterar entre o modo inline e offline de forma automรกtica considerando os requisitos (ex.: dรฉbito do sistema de armazenamento desejado) das aplicaรงรตes e das suas cargas de trabalho dinรขmicas.
Por fim, um protรณtipo do HIODS รฉ implementado e avaliado exaustivamente. Os resultados mostram que o HIODS รฉ capaz de alterar o modo de deduplicaรงรฃo de forma dinรขmica e de acordo com a carga de trabalho, considerando os requisitos de desempenho e a eliminaรงรฃo eficiente dos dados duplicados
A survey and classification of storage deduplication systems
The automatic elimination of duplicate data in a storage system commonly known as deduplication is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid state disks, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development.
The first contribution of this paper is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope.
This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.This work is funded by the European Regional Development Fund (EDRF) through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the Fundacao para a Ciencia e a Tecnologia (FCT; Portuguese Foundation for Science and Technology) within project RED FCOMP-01-0124-FEDER-010156 and the FCT by PhD scholarship SFRH-BD-71372-2010
- โฆ