261 research outputs found

    ๋‚ธ๋“œ ํ”Œ๋ž˜์‹œ ์ €์žฅ์žฅ์น˜์˜ ์„ฑ๋Šฅ ๋ฐ ์ˆ˜๋ช… ํ–ฅ์ƒ์„ ์œ„ํ•œ ํ”„๋กœ๊ทธ๋žจ ์ปจํ…์ŠคํŠธ ๊ธฐ๋ฐ˜ ์ตœ์ ํ™” ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2019. 2. ๊น€์ง€ํ™.์ปดํ“จํŒ… ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด, ๊ธฐ์กด์˜ ๋Š๋ฆฐ ํ•˜๋“œ๋””์Šคํฌ(HDD)๋ฅผ ๋น ๋ฅธ ๋‚ธ๋“œ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ๋ฐ˜ ์ €์žฅ์žฅ์น˜(SSD)๋กœ ๋Œ€์ฒดํ•˜๊ณ ์ž ํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ์ตœ๊ทผ ํ™œ๋ฐœํžˆ ์ง„ํ–‰ ๋˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ง€์†์ ์ธ ๋ฐ˜๋„์ฒด ๊ณต์ • ์Šค์ผ€์ผ๋ง ๋ฐ ๋ฉ€ํ‹ฐ ๋ ˆ๋ฒจ๋ง ๊ธฐ์ˆ ๋กœ SSD ๊ฐ€๊ฒฉ์„ ๋™๊ธ‰ HDD ์ˆ˜์ค€์œผ๋กœ ๋‚ฎ์•„์กŒ์ง€๋งŒ, ์ตœ๊ทผ์˜ ์ฒจ๋‹จ ๋””๋ฐ”์ด์Šค ๊ธฐ์ˆ ์˜ ๋ถ€์ž‘์šฉ์œผ ๋กœ NAND ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์ˆ˜๋ช…์ด ์งง์•„์ง€๋Š” ๊ฒƒ์€ ๊ณ ์„ฑ๋Šฅ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์—์„œ์˜ SSD์˜ ๊ด‘๋ฒ”์œ„ํ•œ ์ฑ„ํƒ์„ ๋ง‰๋Š” ์ฃผ์š” ์žฅ๋ฒฝ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ตœ๊ทผ์˜ ๊ณ ๋ฐ€๋„ ๋‚ธ๋“œ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์ˆ˜๋ช… ๋ฐ ์„ฑ๋Šฅ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์‹œ์Šคํ…œ ๋ ˆ๋ฒจ์˜ ๊ฐœ์„  ๊ธฐ์ˆ ์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ ๋œ ๊ธฐ๋ฒ•์€ ์‘์šฉ ํ”„๋กœ ๊ทธ๋žจ์˜ ์“ฐ๊ธฐ ๋ฌธ๋งฅ์„ ํ™œ์šฉํ•˜์—ฌ ๊ธฐ์กด์—๋Š” ์–ป์„ ์ˆ˜ ์—†์—ˆ๋˜ ๋ฐ์ดํ„ฐ ์ˆ˜๋ช… ํŒจํ„ด ๋ฐ ์ค‘๋ณต ๋ฐ์ดํ„ฐ ํŒจํ„ด์„ ๋ถ„์„ํ•˜์˜€๋‹ค. ์ด์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ, ๋‹จ์ผ ๊ณ„์ธต์˜ ๋‹จ์ˆœํ•œ ์ •๋ณด๋งŒ์„ ํ™œ์šฉํ–ˆ ๋˜ ๊ธฐ์กด ๊ธฐ๋ฒ•์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•จ์œผ๋กœ์จ ํšจ๊ณผ์ ์œผ๋กœ NAND ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์„ฑ๋Šฅ ๋ฐ ์ˆ˜๋ช…์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์ตœ์ ํ™” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•œ๋‹ค. ๋จผ์ €, ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ I/O ์ž‘์—…์—๋Š” ๋ฌธ๋งฅ์— ๋”ฐ๋ผ ๊ณ ์œ ํ•œ ๋ฐ์ดํ„ฐ ์ˆ˜๋ช…๊ณผ ์ค‘ ๋ณต ๋ฐ์ดํ„ฐ์˜ ํŒจํ„ด์ด ์กด์žฌํ•œ๋‹ค๋Š” ์ ์„ ๋ถ„์„์„ ํ†ตํ•ด ํ™•์ธํ•˜์˜€๋‹ค. ๋ฌธ๋งฅ ์ •๋ณด๋ฅผ ํšจ๊ณผ ์ ์œผ๋กœ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ํ”„๋กœ๊ทธ๋žจ ์ปจํ…์ŠคํŠธ (์“ฐ๊ธฐ ๋ฌธ๋งฅ) ์ถ”์ถœ ๋ฐฉ๋ฒ•์„ ๊ตฌํ˜„ ํ•˜์˜€๋‹ค. ํ”„๋กœ๊ทธ๋žจ ์ปจํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ํ†ตํ•ด ๊ฐ€๋น„์ง€ ์ปฌ๋ ‰์…˜ ๋ถ€ํ•˜์™€ ์ œํ•œ๋œ ์ˆ˜๋ช…์˜ NAND ํ”Œ ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ ๊ฐœ์„ ์„ ์œ„ํ•œ ๊ธฐ์กด ๊ธฐ์ˆ ์˜ ํ•œ๊ณ„๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‘˜์งธ, ๋ฉ€ํ‹ฐ ์ŠคํŠธ๋ฆผ SSD์—์„œ WAF๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ ์ˆ˜๋ช… ์˜ˆ์ธก์˜ ์ •ํ™• ์„ฑ์„ ๋†’์ด๋Š” ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ I/O ์ปจํ…์ŠคํŠธ๋ฅผ ํ™œ์šฉ ํ•˜๋Š” ์‹œ์Šคํ…œ ์ˆ˜์ค€์˜ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์˜ ํ•ต์‹ฌ ๋™๊ธฐ๋Š” ๋ฐ์ดํ„ฐ ์ˆ˜๋ช…์ด LBA๋ณด๋‹ค ๋†’์€ ์ถ”์ƒํ™” ์ˆ˜์ค€์—์„œ ํ‰๊ฐ€ ๋˜์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋”ฐ๋ผ์„œ ํ”„ ๋กœ๊ทธ๋žจ ์ปจํ…์ŠคํŠธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ์ดํ„ฐ์˜ ์ˆ˜๋ช…์„ ๋ณด๋‹ค ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•จ์œผ๋กœ์จ, ๊ธฐ์กด ๊ธฐ๋ฒ•์—์„œ LBA๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ์ดํ„ฐ ์ˆ˜๋ช…์„ ๊ด€๋ฆฌํ•˜๋Š” ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•œ๋‹ค. ๊ฒฐ๋ก ์ ์œผ ๋กœ ๋”ฐ๋ผ์„œ ๊ฐ€๋น„์ง€ ์ปฌ๋ ‰์…˜์˜ ํšจ์œจ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ์ˆ˜๋ช…์ด ์งง์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜๋ช…์ด ๊ธด ๋ฐ์ดํ„ฐ์™€ ํšจ๊ณผ์ ์œผ๋กœ ๋ถ„๋ฆฌ ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์“ฐ๊ธฐ ํ”„๋กœ๊ทธ๋žจ ์ปจํ…์ŠคํŠธ์˜ ์ค‘๋ณต ๋ฐ์ดํ„ฐ ํŒจํ„ด ๋ถ„์„์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ถˆํ•„์š”ํ•œ ์ค‘๋ณต ์ œ๊ฑฐ ์ž‘์—…์„ ํ”ผํ•  ์ˆ˜์žˆ๋Š” ์„ ํƒ์  ์ค‘๋ณต ์ œ๊ฑฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ค‘๋ณต ๋ฐ ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜์ง€ ์•Š๋Š” ํ”„๋กœ๊ทธ๋žจ ์ปจํ…์ŠคํŠธ๊ฐ€ ์กด์žฌํ•จ์„ ๋ถ„์„์ ์œผ๋กœ ๋ณด์ด๊ณ  ์ด๋“ค์„ ์ œ์™ธํ•จ์œผ๋กœ์จ, ์ค‘๋ณต์ œ๊ฑฐ ๋™์ž‘์˜ ํšจ์œจ์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์ค‘๋ณต ๋ฐ์ดํ„ฐ๊ฐ€ ๋ฐœ์ƒ ํ•˜๋Š” ํŒจํ„ด์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๊ธฐ๋ก๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” ์ž๋ฃŒ๊ตฌ์กฐ ์œ ์ง€ ์ •์ฑ…์„ ์ƒˆ๋กญ๊ฒŒ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ, ์„œ๋ธŒ ํŽ˜์ด์ง€ ์ฒญํฌ๋ฅผ ๋„์ž…ํ•˜์—ฌ ์ค‘๋ณต ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ฑฐ ํ•  ๊ฐ€๋Šฅ์„ฑ์„ ๋†’์ด๋Š” ์„ธ๋ถ„ํ™” ๋œ ์ค‘๋ณต ์ œ๊ฑฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ ๋œ ๊ธฐ์ˆ ์˜ ํšจ๊ณผ๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์‹ค์ œ ์‹œ์Šคํ…œ์—์„œ ์ˆ˜์ง‘ ๋œ I/O ํŠธ๋ ˆ์ด์Šค์— ๊ธฐ๋ฐ˜ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ‰๊ฐ€ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์—๋ฎฌ๋ ˆ์ดํ„ฐ ๊ตฌํ˜„์„ ํ†ตํ•ด ์‹ค์ œ ์‘์šฉ์„ ๋™์ž‘ํ•˜๋ฉด์„œ ์ผ๋ จ์˜ ํ‰๊ฐ€๋ฅผ ์ˆ˜ํ–‰ํ–ˆ๋‹ค. ๋” ๋‚˜์•„๊ฐ€ ๋ฉ€ํ‹ฐ ์ŠคํŠธ๋ฆผ ๋””๋ฐ”์ด์Šค์˜ ๋‚ด๋ถ€ ํŽŒ์›จ์–ด๋ฅผ ์ˆ˜์ •ํ•˜์—ฌ ์‹ค์ œ์™€ ๊ฐ€์žฅ ๋น„์Šทํ•˜๊ฒŒ ์„ค์ •๋œ ํ™˜๊ฒฝ์—์„œ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•˜ ์˜€๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์ œ์•ˆ๋œ ์‹œ์Šคํ…œ ์ˆ˜์ค€ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์ด ์„ฑ๋Šฅ ๋ฐ ์ˆ˜๋ช… ๊ฐœ์„  ์ธก๋ฉด์—์„œ ๊ธฐ์กด ์ตœ์ ํ™” ๊ธฐ๋ฒ•๋ณด๋‹ค ๋” ํšจ๊ณผ์ ์ด์—ˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ํ–ฅํ›„ ์ œ์•ˆ๋œ ๊ธฐ ๋ฒ•๋“ค์ด ๋ณด๋‹ค ๋” ๋ฐœ์ „๋œ๋‹ค๋ฉด, ๋‚ธ๋“œ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ดˆ๊ณ ์† ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์˜ ์ฃผ ์ €์žฅ์žฅ์น˜๋กœ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์— ๊ธ์ •์ ์ธ ๊ธฐ์—ฌ๋ฅผ ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.Replacing HDDs with NAND flash-based storage devices (SSDs) has been one of the major challenges in modern computing systems especially in regards to better performance and higher mobility. Although the continuous semiconductor process scaling and multi-leveling techniques lower the price of SSDs to the comparable level of HDDs, the decreasing lifetime of NAND flash memory, as a side effect of recent advanced device technologies, is emerging as one of the major barriers to the wide adoption of SSDs in highperformance computing systems. In this dissertation, system-level lifetime improvement techniques for recent high-density NAND flash memory are proposed. Unlike existing techniques, the proposed techniques resolve the problems of decreasing performance and lifetime of NAND flash memory by exploiting the I/O context of an application to analyze data lifetime patterns or duplicate data contents patterns. We first present that I/O activities of an application have distinct data lifetime and duplicate data patterns. In order to effectively utilize the context information, we implemented the program context extraction method. With the program context, we can overcome the limitations of existing techniques for improving the garbage collection overhead and limited lifetime of NAND flash memory. Second, we propose a system-level approach to reduce WAF that exploits the I/O context of an application to increase the data lifetime prediction for the multi-streamed SSDs. The key motivation behind the proposed technique was that data lifetimes should be estimated at a higher abstraction level than LBAs, so we employ a write program context as a stream management unit. Thus, it can effectively separate data with short lifetimes from data with long lifetimes to improve the efficiency of garbage collection. Lastly, we propose a selective deduplication that can avoid unnecessary deduplication work based on the duplicate data pattern analysis of write program context. With the help of selective deduplication, we also propose fine-grained deduplication which improves the likelihood of eliminating redundant data by introducing sub-page chunk. It also resolves technical difficulties caused by its finer granularity, i.e., increased memory requirement and read response time. In order to evaluate the effectiveness of the proposed techniques, we performed a series of evaluations using both a trace-driven simulator and emulator with I/O traces which were collected from various real-world systems. To understand the feasibility of the proposed techniques, we also implemented them in Linux kernel on top of our in-house flash storage prototype and then evaluated their effects on the lifetime while running real-world applications. Our experimental results show that system-level optimization techniques are more effective over existing optimization techniques.I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Garbage Collection Problem . . . . . . . . . . . . . 2 1.1.2 Limited Endurance Problem . . . . . . . . . . . . . 4 1.2 Dissertation Goals . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . 7 II. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 NAND Flash Memory System Software . . . . . . . . . . . 9 2.2 NAND Flash-Based Storage Devices . . . . . . . . . . . . . 10 2.3 Multi-stream Interface . . . . . . . . . . . . . . . . . . . . 11 2.4 Inline Data Deduplication Technique . . . . . . . . . . . . . 12 2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5.1 Data Separation Techniques for Multi-streamed SSDs 13 2.5.2 Write Traffic Reduction Techniques . . . . . . . . . 15 2.5.3 Program Context based Optimization Techniques for Operating Systems . . . . . . . . 18 III. Program Context-based Analysis . . . . . . . . . . . . . . . . 21 3.1 Definition and Extraction of Program Context . . . . . . . . 21 3.2 Data Lifetime Patterns of I/O Activities . . . . . . . . . . . 24 3.3 Duplicate Data Patterns of I/O Activities . . . . . . . . . . . 26 IV. Fully Automatic Stream Management For Multi-Streamed SSDs Using Program Contexts . . 29 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.1 No Automatic Stream Management for General I/O Workloads . . . . . . . . . 33 4.2.2 Limited Number of Supported Streams . . . . . . . 36 4.3 Automatic I/O Activity Management . . . . . . . . . . . . . 38 4.3.1 PC as a Unit of Lifetime Classification for General I/O Workloads . . . . . . . . . . . 39 4.4 Support for Large Number of Streams . . . . . . . . . . . . 41 4.4.1 PCs with Large Lifetime Variances . . . . . . . . . 42 4.4.2 Implementation of Internal Streams . . . . . . . . . 44 4.5 Design and Implementation of PCStream . . . . . . . . . . 46 4.5.1 PC Lifetime Management . . . . . . . . . . . . . . 46 4.5.2 Mapping PCs to SSD streams . . . . . . . . . . . . 49 4.5.3 Internal Stream Management . . . . . . . . . . . . . 50 4.5.4 PC Extraction for Indirect Writes . . . . . . . . . . 51 4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . 53 4.6.1 Experimental Settings . . . . . . . . . . . . . . . . 53 4.6.2 Performance Evaluation . . . . . . . . . . . . . . . 55 4.6.3 WAF Comparison . . . . . . . . . . . . . . . . . . . 56 4.6.4 Per-stream Lifetime Distribution Analysis . . . . . . 57 4.6.5 Impact of Internal Streams . . . . . . . . . . . . . . 58 4.6.6 Impact of the PC Attribute Table . . . . . . . . . . . 60 V. Deduplication Technique using Program Contexts . . . . . . 62 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.2 Selective Deduplication using Program Contexts . . . . . . . 63 5.2.1 PCDedup: Improving SSD Deduplication Efficiency using Selective Hash Cache Management . . . . . . 63 5.2.2 2-level LRU Eviction Policy . . . . . . . . . . . . . 68 5.3 Exploiting Small Chunk Size . . . . . . . . . . . . . . . . . 70 5.3.1 Fine-Grained Deduplication . . . . . . . . . . . . . 70 5.3.2 Read Overhead Management . . . . . . . . . . . . . 76 5.3.3 Memory Overhead Management . . . . . . . . . . . 80 5.3.4 Experimental Results . . . . . . . . . . . . . . . . . 82 VI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.1 Summary and Conclusions . . . . . . . . . . . . . . . . . . 88 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.2.1 Supporting applications that have unusal program contexts . . . . . . . . . . . . . 89 6.2.2 Optimizing read request based on the I/O context . . 90 6.2.3 Exploiting context information to improve fingerprint lookups . . . . .. . . . . . 91 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92Docto

    HIODS: hybrid inline and offline deduplication system

    Get PDF
    Dissertaรงรฃo de mestrado integrado em Engenharia InformรกticaDeduplication is a technique that allows finding and removing duplicate data at storage systems. With the current exponential growth of digital information, this mechanism is becoming more and more desirable for reducing the infrastructural costs of persisting such data. Therefore, deduplication is now being widely applied to several storage appliances serving applications with different requirements (e.g., archival, backup, primary storage). However, deduplication requires additional processing logic for each storage request in order to detect and eliminate duplicate content. Traditionally, this processing is done in the I/O critical path (inline), thus introducing a performance penalty on the throughput and latency of requests being served by the storage appliance. An alternative solution is to do this process as a background task, thus outside of the I/O critical path (offline), at the cost of requiring additional storage space as duplicate content is not found and eliminated immediately. However, the choice of what type of strategy to use is typically done manually and does not take into consideration changes in the applications' workloads. This dissertation proposes HIODS, a hybrid deduplication solution capable of automati cally changing between inline and offline deduplication according to the requirements (e.g., desired storage I/O throughput goal) of applications and their dynamic workloads. The goal is to choose the best strategy that fulfills the targeted I/O performance objectives while optimizing deduplication space savings. Finally, a prototype of HIODS is implemented and evaluated extensively with different storage workloads. Results show that HIODS is able to change its deduplication mode dy namically, according to the storage workload being served, while balancing I/O performance and space savings requirements efficiently.A deduplicaรงรฃo รฉ uma tรฉcnica que permite encontrar e remover dados duplicados guardados nos sistemas de armazenamento. Com o crescimento exponencial da informaรงรฃo digital que vivemos atualmente, este mecanismo estรก a tornar-se cada vez mais popular para reduzir os custos das infraestruturas onde esses dados se encontram alojados. De facto, a deduplicaรงรฃo รฉ, hoje em dia, usada numa grande variedade de serviรงos de armazenamento que servem diferentes aplicaรงรตes com requisitos particulares (ex.: arquivo, backup, armazenamento primรกrio). No entanto, a deduplicaรงรฃo adiciona uma camada de processamento extra a cada pedido de armazenamento, de modo a conseguir detetar e eliminar o conteรบdo redundante. Tradicionalmente, este processo รฉ realizado durante o caminho crรญtico do I/O (inline), causando perdas de desempenho e aumentos na latรชncia dos pedidos processados. Uma alternativa รฉ alterar o processamento para segundo plano, aliviando assim os custos no caminho crรญtico do I/O (offline). Esta soluรงรฃo requer espaรงo de armazenamento adicional, visto que os duplicados nรฃo sรฃo encontrados nem eliminados imediatamente. No entanto, a estratรฉgia a seguir รฉ escolhida de forma manual, nรฃo tendo em consideraรงรฃo qualquer possรญvel mudanรงa na carga de trabalho das aplicaรงรตes. Esta dissertaรงรฃo propรตe assim o HIODS, um sistema de deduplicaรงรฃo hรญbrido capaz de alterar entre o modo inline e offline de forma automรกtica considerando os requisitos (ex.: dรฉbito do sistema de armazenamento desejado) das aplicaรงรตes e das suas cargas de trabalho dinรขmicas. Por fim, um protรณtipo do HIODS รฉ implementado e avaliado exaustivamente. Os resultados mostram que o HIODS รฉ capaz de alterar o modo de deduplicaรงรฃo de forma dinรขmica e de acordo com a carga de trabalho, considerando os requisitos de desempenho e a eliminaรงรฃo eficiente dos dados duplicados

    A survey and classification of storage deduplication systems

    Get PDF
    The automatic elimination of duplicate data in a storage system commonly known as deduplication is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid state disks, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development. The first contribution of this paper is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.This work is funded by the European Regional Development Fund (EDRF) through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the Fundacao para a Ciencia e a Tecnologia (FCT; Portuguese Foundation for Science and Technology) within project RED FCOMP-01-0124-FEDER-010156 and the FCT by PhD scholarship SFRH-BD-71372-2010
    • โ€ฆ
    corecore