1 research outputs found

    ๋น…๋ฐ์ดํ„ฐ์˜ ํšจ์œจ์ ์ธ ์Šค์นด์ด๋ผ์ธ ์งˆ์˜ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 2. ์‹ฌ๊ทœ์„.์Šค์นด์ด๋ผ์ธ ์งˆ์˜์™€ ์Šค์นด์ด๋ผ์ธ์—์„œ ํŒŒ์ƒ๋œ ๋™์  ์Šค์นด์ด๋ผ์ธ, ์—ญ ์Šค์นด์ด๋ผ์ธ ๊ทธ๋ฆฌ๊ณ  ํ™•๋ฅ ์  ์Šค์นด์ด๋ผ์ธ ์งˆ์˜๋“ค์€ ๋‹ค์–‘ํ•œ ์‘์šฉ์ด ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ตœ๊ทผ์— ๋งŽ์€ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜์–ด ์™”๋‹ค. ์Šค์นด์ด๋ผ์ธ ์งˆ์˜๋“ค์€ ํฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•ด์•ผํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ํšจ์œจ์ ์ธ ์Šค์นด์ด๋ผ์ธ ์งˆ์˜ ์ฒ˜๋ฆฌ๋Š” ์ค‘์š”ํ•œ ๋ฌธ์ œ์ด๋‹ค. ํฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•ด์•ผํ•˜๋Š” ๊ฒฝ์šฐ๋ฅผ ์œ„ํ•ด ๋งต๋ฆฌ๋“€์Šค ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๊ณ , ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์Šค์นด์ด๋ผ์ธ, ๋™์  ์Šค์นด์ด๋ผ์ธ, ์—ญ ์Šค์นด์ด๋ผ์ธ, ํ™•๋ฅ ์  ์Šค์นด์ด๋ผ์ธ ์งˆ์˜ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ํšจ์œจ์ ์ธ ๋งต๋ฆฌ๋“€์Šค ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•œ๋‹ค. ์Šค์นด์ด๋ผ์ธ, ๋™์  ์Šค์นด์ด๋ผ์ธ, ์—ญ ์Šค์นด์ด๋ผ์ธ์— ๋Œ€ํ•ด์„œ๋Š” ์งˆ์˜ ๊ฒฐ๊ณผ์— ํฌํ•จ๋  ์ˆ˜ ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋น ๋ฅด๊ฒŒ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ฟผ๋“œํŠธ๋ฆฌ์— ๊ธฐ๋ฐ˜ํ•œ ํžˆ์Šคํ† ๊ทธ๋žจ์„ ์ƒ์„ฑํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํžˆ์Šคํ† ๊ทธ๋žจ์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ ํŒŒํ‹ฐ์…˜์œผ๋กœ ๋‚˜๋ˆ„๊ณ  ๊ฐ ํŒŒํ‹ฐ์…˜์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋งŒ์„ ์ด์šฉํ•˜์—ฌ ์Šค์นด์ด๋ผ์ธ์ด ๋  ์ˆ˜ ์žˆ๋Š” ํ›„๋ณด ๋ฐ์ดํ„ฐ๋ฅผ ๋งต๋ฆฌ๋“€์Šค๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ณ‘๋ ฌ์ ์œผ๋กœ ๋ฝ‘์•„๋‚ธ๋‹ค. ๊ทธ ํ›„์— ๋‹ค์‹œ ๋งต๋ฆฌ๋“€์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณ‘๋ ฌ์ ์œผ๋กœ ํ›„๋ณด ๋ฐ์ดํ„ฐ์ค‘ ์‹ค์ œ ์Šค์นด์ด๋ผ์ธ์„ ์ฐพ์•„๋‚ธ๋‹ค. ํ™•๋ฅ ์  ์Šค์นด์ด๋ผ์ธ์˜ ํšจ์œจ์ ์ธ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด ๋จผ์ € ์„ธ๊ฐ€์ง€ ํ•„ํ„ฐ๋ง ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด ํ•„ํ„ฐ๋ง ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ฟผ๋“œํŠธ๋ฆฌ์— ๊ธฐ๋ฐ˜ํ•œ ํžˆ์Šคํ† ๊ทธ๋žจ์„ ์ƒ์„ฑํ•œ๋‹ค. ์ฟผ๋“œํŠธ๋ฆฌ์˜ ์˜์—ญ์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ๋ฅผ ํŒŒํ‹ฐ์…˜ํ•˜๊ณ  ๊ฐ ํŒŒํ‹ฐ์…˜๋งˆ๋‹ค ํ™•๋ฅ ์  ์Šค์นด์ด๋ผ์ธ ์ ๋“ค์„ ์ฐพ์•„๋‚ธ๋‹ค. ๊ฐ ์ปดํ“จํ„ฐ์˜ ์ˆ˜ํ–‰์‹œ๊ฐ„์„ ๋น„์Šทํ•˜๊ฒŒ ๋งž์ถ”๊ธฐ ์œ„ํ•ด์„œ ๋ถ€ํ•˜๊ท ํ˜• ๊ธฐ๋ฒ•๋„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋‹ค์–‘ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ์ œ์•ˆํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์„ฑ๋Šฅ๋“ค์ด ์ตœ์‹  ๊ด€๋ จ ์—ฐ๊ตฌ ๋ณด๋‹ค ์ข‹์Œ์„ ํ™•์ธํ•˜์˜€๊ณ , ์‚ฌ์šฉํ•˜๋Š” ์ปดํ“จํ„ฐ์˜ ์ˆ˜๋ฅผ ๋Š˜๋ฆผ์— ๋”ฐ๋ผ ์„ฑ๋Šฅ์ด ํ™•์žฅ์„ฑ์„ ๊ฐ–๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค.The skyline operator and its variants such as dynamic skyline, reverse skyline and probabilistic skyline operators have attracted considerable attention recently due to its broad applications. However, computing a skyline is challenging today since we have to deal with big data. For data-intensive applications, the MapReduce framework has been widely used recently. In this dissertation, we propose the efficient parallel algorithms for processing skyline, dynamic skyline, reverse skyline and probabilistic skyline queries using MapReduce. For the skyline, dynamic skyline and reverse skyline queries, we first build quadtree-based histograms to prune out non-skyline points. We next partition data based on the regions divided by the histograms and compute candidate skyline points for each partition using MapReduce. Finally, in every partition, we check whether each skyline candidate point is actually a skyline point or not using MapReduce. For the probabilistic skyline query, we first introduce three filtering techniques to prune out points that are not probabilistic skyline points. Then, we build a quadtree-based histogram and split data into partitions according to the regions divided by the quadtree. We finally compute the probabilistic skyline points for each partition using MapReduce. We also develop the workload balancing methods to make the estimated execution times of all available machines to be similar. We did experiments to compare our algorithms with the state-of-the-art algorithms using MapReduce and confirmed the effectiveness as well as the scalability of our proposed skyline algorithms.1 INTRODUCTION 1 1.1 Motivation 1 1.2 Contributions of This Dissertation 6 1.3 Dissertation Overview 8 2 Related Work 10 2.1 Skyline Queries 10 2.2 Reverse Skyline Queries 13 2.3 Probabilistic Skyline Queries 14 3 Background 17 3.1 Skyline and Its Variants 17 3.2 MapReduce Framework 22 4 Parallel Skyline Query Processing 24 4.1 SKY-MR: Our Skyline Computation Algorithm 24 4.1.1 SKY-QTREE: The Sky-Quadtree Building Algorithm 25 4.1.2 L-SKY-MR: The Local Skyline Computation Algorithm 29 4.1.3 G-SKY-MR: The Global Skyline Computation Algorithm 32 4.2 Experiment 34 4.2.1 Performance Results for Skylines 36 4.2.2 Performance Results in Other Environments 41 5 Parallel Reverse Skyline Query Processing 45 5.1 RSKY-MR: Our Reverse Skyline Computation Algorithm 45 5.1.1 RSKY-QTREE: The Rsky-Quadtree Building Algorithm 47 5.1.2 Computations of Reverse Skylines using Rsky-Quadtrees 50 5.1.3 L-RSKY-MR: The Local Reverse Skyline Computation Algorithm 53 5.1.4 G-RSKY-MR: The Global Reverse Skyline Computation Algorithm 57 5.2 Experiment 59 5.2.1 Performance Results for Reverse Skylines 59 6 Parallel Probabilistic Skyline Query Processing 63 6.1 Early Pruning Techniques 63 6.1.1 Upper-bound Filtering 63 6.1.2 Zero-probability Filtering 67 6.1.3 Dominance-Power Filtering 68 6.2 Utilization of a PS-QTREE for Pruning 69 6.2.1 Generating a PS-QTREE 70 6.2.2 Exploiting a PS-QTREE for Filtering 70 6.2.3 Partitioning Objects by a PS-QTREE 71 6.3 PS-QPF-MR: Our Algorithm with Quadtree Partitiong and Filtering 73 6.3.1 Optimizations of PS-QPF-MR 79 6.3.2 Sample Size and Split Threshold of a PSQtree 83 6.4 PS-BRF-MR: Our Algorithm with Random Partitioning and Filtering 84 6.5 Experiments 87 6.5.1 Performance Results for Probabilistic Skylines 89 7 Conclusion 97 Bibliography 99 Abstract (In Korean) 105Docto
    corecore