12,326 research outputs found

    Adaptive Energy-aware Scheduling of Dynamic Event Analytics across Edge and Cloud Resources

    Full text link
    The growing deployment of sensors as part of Internet of Things (IoT) is generating thousands of event streams. Complex Event Processing (CEP) queries offer a useful paradigm for rapid decision-making over such data sources. While often centralized in the Cloud, the deployment of capable edge devices on the field motivates the need for cooperative event analytics that span Edge and Cloud computing. Here, we identify a novel problem of query placement on edge and Cloud resources for dynamically arriving and departing analytic dataflows. We define this as an optimization problem to minimize the total makespan for all event analytics, while meeting energy and compute constraints of the resources. We propose 4 adaptive heuristics and 3 rebalancing strategies for such dynamic dataflows, and validate them using detailed simulations for 100 - 1000 edge devices and VMs. The results show that our heuristics offer O(seconds) planning time, give a valid and high quality solution in all cases, and reduce the number of query migrations. Furthermore, rebalance strategies when applied in these heuristics have significantly reduced the makespan by around 20 - 25%.Comment: 11 pages, 7 figure

    DualTable: A Hybrid Storage Model for Update Optimization in Hive

    Full text link
    Hive is the most mature and prevalent data warehouse tool providing SQL-like interface in the Hadoop ecosystem. It is successfully used in many Internet companies and shows its value for big data processing in traditional industries. However, enterprise big data processing systems as in Smart Grid applications usually require complicated business logics and involve many data manipulation operations like updates and deletes. Hive cannot offer sufficient support for these while preserving high query performance. Hive using the Hadoop Distributed File System (HDFS) for storage cannot implement data manipulation efficiently and Hive on HBase suffers from poor query performance even though it can support faster data manipulation.There is a project based on Hive issue Hive-5317 to support update operations, but it has not been finished in Hive's latest version. Since this ACID compliant extension adopts same data storage format on HDFS, the update performance problem is not solved. In this paper, we propose a hybrid storage model called DualTable, which combines the efficient streaming reads of HDFS and the random write capability of HBase. Hive on DualTable provides better data manipulation support and preserves query performance at the same time. Experiments on a TPC-H data set and on a real smart grid data set show that Hive on DualTable is up to 10 times faster than Hive when executing update and delete operations.Comment: accepted by industry session of ICDE201

    ํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ์—์„œ ๋น ๋ฅด๊ณ  ํšจ์œจ์ ์ธ IoT ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ์—”๋“œ-ํˆฌ-์—”๋“œ ์ตœ์ ํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. ์—„ํƒœ๊ฑด.As a large amount of data streams are generated from Internet of Things (IoT) devices, two types of IoT stream queries are deployed in the cloud. One is a small IoT-stream query, which continuously processes a few IoT data streams of end-usersโ€™s IoT devices that have low input rates (e.g., one event per second). The other one is a big IoT-stream query, which is deployed by data scientists to continuously process a large number and huge amount of aggregated data streams that can suddenly fluctuate in a short period of time (bursty loads). However, existing work and stream systems fall short of handling such workloads efficiently because their query submission, compilation, execution, and resource acquisition layer are not optimized for the workloads. This dissertation proposes two end-to-end optimization techniquesโ€” not only optimizing stream query execution layer (runtime), but also optimizing query submission, compiler, or resource acquisition layer. First, to minimize the number of cloud machines and maintenance cost of servers in processing many small IoT queries, we build Pluto, a new stream processing system that optimizes both query submission and execution layer for efficiently handling many small IoT stream queries. By decoupling IoT query submission and its code registration and offering new APIs, Pluto mitigates the bottleneck in query submission and enables efficient resource sharing across small IoT stream queries in the execution. Second, to quickly handle sudden bursty loads and scale out big IoT stream queries, we build Sponge, which is a new stream system that optimizes query compilation, execution, and resource acquisition layer altogether. For fast acquisition of new resources, Sponge uses a new cloud computing service, called Lambda, because it offers fast-to-start lightweight containers. Sponge then converts the streaming dataflow of big stream queries to overcome Lambdaโ€™s resource constraint and to minimize scaling overheads at runtime. Our evaluations show that the end-to-end optimization techniques significantly improve system throughput and latency compared to existing stream systems in handling a large number of small IoT stream queries and in handling bursty loads of big IoT stream queries.๋‹ค์–‘ํ•œ IoT ๋””๋ฐ”์ด์Šค๋กœ๋ถ€ํ„ฐ ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ๋“ค์ด ์ƒ์„ฑ๋˜๋ฉด์„œ, ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ํƒ€์ž…์˜ ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ๊ฐ€ ํด๋ผ์šฐ๋“œ์—์„œ ์ˆ˜ํ–‰๋œ๋‹ค. ์ฒซ์งธ๋กœ๋Š” ์ž‘์€-IoT ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ์ด๋ฉฐ, ํ•˜๋‚˜์˜ ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ๊ฐ€ ์ ์€ ์–‘์˜ IoT ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ์„ ์ฒ˜๋ฆฌํ•˜๊ณ  ๋งŽ์€ ์ˆ˜์˜ ์ž‘์€ ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ๋“ค์ด ์กด์žฌํ•œ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ๋Š” ํฐ-IoT ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ์ด๋ฉฐ, ํ•˜๋‚˜ ์˜ ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ๊ฐ€ ๋งŽ์€ ์–‘์˜, ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•˜๋Š” IoT ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ๋“ค์„ ์ฒ˜๋ฆฌํ•œ๋‹ค. ํ•˜์ง€๋งŒ, ๊ธฐ์กด ์—ฐ๊ตฌ์™€ ์ŠคํŠธ๋ฆผ ์‹œ์Šคํ…œ์—์„œ๋Š” ์ฟผ๋ฆฌ ์ˆ˜ํ–‰, ์ œ์ถœ, ์ปดํŒŒ์ผ๋Ÿฌ, ๋ฐ ๋ฆฌ์†Œ์Šค ํ™•๋ณด ๋ ˆ์ด์–ด๊ฐ€ ์ด๋Ÿฌํ•œ ์›Œํฌ๋กœ๋“œ์— ์ตœ์ ํ™”๋˜์–ด ์žˆ์ง€ ์•Š์•„์„œ ์ž‘์€-IoT ๋ฐ ํฐ-IoT ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชปํ•œ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ž‘์€-IoT ๋ฐ ํฐ-IoT ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ ์›Œํฌ๋กœ๋“œ๋ฅผ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ์—”๋“œ-ํˆฌ-์—”๋“œ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ฒซ๋ฒˆ์งธ๋กœ, ๋งŽ์€ ์ˆ˜์˜ ์ž‘์€-IoT ์ŠคํŠธ๋ฆผ ์ฟผ ๋ฆฌ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด, ์ฟผ๋ฆฌ ์ œ์ถœ๊ณผ ์ˆ˜ํ–‰ ๋ ˆ์ด์–ด๋ฅผ ์ตœ์ ํ™” ํ•˜๋Š” ๊ธฐ๋ฒ•์ธ IoT ํŠน์„ฑ ๊ธฐ๋ฐ˜ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ฟผ๋ฆฌ ์ œ์ถœ๊ณผ ์ฝ”๋“œ ๋“ฑ๋ก์„ ๋ถ„๋ฆฌํ•˜๊ณ , ์ด๋ฅผ ์œ„ํ•œ ์ƒˆ๋กœ์šด API๋ฅผ ์ œ๊ณตํ•จ์œผ๋กœ์จ, ์ฟผ๋ฆฌ ์ œ์ถœ์—์„œ์˜ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ด๊ณ  ์ฟผ๋ฆฌ ์ˆ˜ํ–‰์—์„œ IoT ํŠน ์„ฑ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฆฌ์†Œ์Šค๋ฅผ ๊ณต์œ ํ•จ์œผ๋กœ์จ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ธ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ, ํฐ-IoT ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ์—์„œ ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•˜๋Š” ๋กœ๋“œ๋ฅผ ๋น ๋ฅด๊ฒŒ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด, ์ฟผ๋ฆฌ ์ปดํŒŒ์ผ๋Ÿฌ, ์ˆ˜ํ–‰, ๋ฐ ๋ฆฌ์†Œ์Šค ํ™•๋ณด ๋ ˆ์ด์–ด ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ƒˆ๋กœ์šด ํด๋ผ์šฐ๋“œ ์ปดํ“จํŒ… ๋ฆฌ์†Œ์Šค์ธ ๋žŒ๋‹ค๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋น ๋ฅด๊ฒŒ ๋ฆฌ์†Œ์Šค๋ฅผ ํ™•๋ณดํ•˜๊ณ , ๋žŒ๋‹ค์˜ ์ œํ•œ๋œ ๋ฆฌ์†Œ์Šค์—์„œ ์Šค์ผ€์ผ-์•„์›ƒ ์˜ค ๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ŠคํŠธ๋ฆผ ๋ฐ์ดํ„ฐํ”Œ๋กœ์šฐ๋ฅผ ๋ฐ”๊ฟˆ์œผ๋กœ์จ ํฐ-IoT ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ์˜ ์ž‘์—…๋Ÿ‰์„ ๋น ๋ฅด๊ฒŒ ๋žŒ๋‹ค๋กœ ์˜ฎ๊ธด๋‹ค. ์ตœ์ ํ™” ๊ธฐ๋ฒ•์˜ ํšจ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•ด, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋‘๊ฐ€์ง€ ์‹œ์Šคํ…œ-Pluto ์™€ Sponge-์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์‹คํ—˜์„ ํ†ตํ•ด์„œ, ๊ฐ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ์ ์šฉํ•œ ๊ฒฐ๊ณผ ๊ธฐ์กด ์‹œ์Šคํ…œ ๋Œ€๋น„ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์œผ๋ฉฐ, ์ง€์—ฐ์‹œ๊ฐ„์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค.Chapter 1 Introduction 1 1.1 IoT Stream Workloads 1 1.1.1 Small IoT Stream Query 2 1.1.2 Big IoT Stream Query 4 1.2 Proposed Solution 5 1.2.1 IoT-Aware Three-Phase Query Execution 6 1.2.2 Streaming Dataflow Reshaping on Lambda 7 1.3 Contribution 8 1.4 Dissertation Structure 9 Chapter 2 Background 10 2.1 Stream Query Model 10 2.2 Workload Characteristics 12 2.2.1 Small IoT Stream Query 12 2.2.2 Big IoT Stream Query 13 Chapter 3 IoT-Aware Three-Phase Query Execution 15 3.1 Pluto Design Overview 16 3.2 Decoupling of Code and Query Submission 19 3.2.1 Code Registration 19 3.2.2 Query Submission API 20 3.3 IoT-Aware Execution Model 21 3.3.1 Q-Group Creation and Query Grouping 24 3.3.2 Q-Group Assignment 24 3.3.3 Q-Group Scheduling and Processing 25 3.3.4 Load Rebalancing: Q-Group Split and Merging 28 3.4 Implementation 29 3.5 Evaluation 30 3.5.1 Methodology 30 3.5.2 Performance Comparison 34 3.5.3 Performance Breakdown 36 3.5.4 Load Rebalancing: Q-Group Split and Merging 38 3.5.5 Tradeoff 40 3.6 Discussion 41 3.7 Related Work 43 3.8 Summary 44 Chapter 4 Streaming Dataflow Reshaping for Fast Scaling Mechanism on Lambda 46 4.1 Motivation 46 4.2 Challenges 47 4.3 Design Overview 50 4.4 Reshaping Rules 51 4.4.1 R1:Inserting Router Operators 52 4.4.2 R2:Inserting Transient Operators 54 4.4.3 R3:Inserting State Merger Operators 57 4.5 Scaling Protocol 59 4.5.1 Redirection Protocol 59 4.5.2 Merging Protocol 60 4.5.3 Migration Protocol 61 4.6 Implementation 61 4.7 Evaluation 63 4.7.1 Methodology 63 4.7.2 Performance Analysis 68 4.7.3 Performance Breakdown 70 4.7.4 Latency-Cost($) Trade-Off 76 4.8 Discussion 77 4.9 Related Work 78 4.10 Summary 80 Chapter 5 Conclusion 81๋ฐ•
    • โ€ฆ
    corecore