Search CORE

12,326 research outputs found

Adaptive Energy-aware Scheduling of Dynamic Event Analytics across Edge and Cloud Resources

Author: Ghosh Rajrup
Komma Siva Prakash Reddy
Simmhan Yogesh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/01/2018
Field of study

The growing deployment of sensors as part of Internet of Things (IoT) is generating thousands of event streams. Complex Event Processing (CEP) queries offer a useful paradigm for rapid decision-making over such data sources. While often centralized in the Cloud, the deployment of capable edge devices on the field motivates the need for cooperative event analytics that span Edge and Cloud computing. Here, we identify a novel problem of query placement on edge and Cloud resources for dynamically arriving and departing analytic dataflows. We define this as an optimization problem to minimize the total makespan for all event analytics, while meeting energy and compute constraints of the resources. We propose 4 adaptive heuristics and 3 rebalancing strategies for such dynamic dataflows, and validate them using detailed simulations for 100 - 1000 edge devices and VMs. The results show that our heuristics offer O(seconds) planning time, give a valid and high quality solution in all cases, and reduce the number of query migrations. Furthermore, rebalance strategies when applied in these heuristics have significantly reduced the makespan by around 20 - 25%.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

Crossref

DualTable: A Hybrid Storage Model for Update Optimization in Hive

Author: Hu Songlin
Huang Shuo
Jacobsen Hans-Arno
Liang Ying
Liu Wantao
Pei Xubin
Rabl Tilmann
Wang Jiye
Xiao Zheng
Publication venue
Publication date: 01/12/2014
Field of study

Hive is the most mature and prevalent data warehouse tool providing SQL-like interface in the Hadoop ecosystem. It is successfully used in many Internet companies and shows its value for big data processing in traditional industries. However, enterprise big data processing systems as in Smart Grid applications usually require complicated business logics and involve many data manipulation operations like updates and deletes. Hive cannot offer sufficient support for these while preserving high query performance. Hive using the Hadoop Distributed File System (HDFS) for storage cannot implement data manipulation efficiently and Hive on HBase suffers from poor query performance even though it can support faster data manipulation.There is a project based on Hive issue Hive-5317 to support update operations, but it has not been finished in Hive's latest version. Since this ACID compliant extension adopts same data storage format on HDFS, the update performance problem is not solved. In this paper, we propose a hybrid storage model called DualTable, which combines the efficient streaming reads of HDFS and the random write capability of HBase. Hive on DualTable provides better data manipulation support and preserves query performance at the same time. Experiments on a TPC-H data set and on a real smart grid data set show that Hive on DualTable is up to 10 times faster than Hive when executing update and delete operations.Comment: accepted by industry session of ICDE201

arXiv.org e-Print Archive

Crossref

클라우드 환경에서 빠르고 효율적인 IoT 스트림 처리를 위한 엔드-투-엔드 최적화

Author: 엄태건
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2021.8. 엄태건.As a large amount of data streams are generated from Internet of Things (IoT) devices, two types of IoT stream queries are deployed in the cloud. One is a small IoT-stream query, which continuously processes a few IoT data streams of end-users’s IoT devices that have low input rates (e.g., one event per second). The other one is a big IoT-stream query, which is deployed by data scientists to continuously process a large number and huge amount of aggregated data streams that can suddenly fluctuate in a short period of time (bursty loads). However, existing work and stream systems fall short of handling such workloads efficiently because their query submission, compilation, execution, and resource acquisition layer are not optimized for the workloads. This dissertation proposes two end-to-end optimization techniques— not only optimizing stream query execution layer (runtime), but also optimizing query submission, compiler, or resource acquisition layer. First, to minimize the number of cloud machines and maintenance cost of servers in processing many small IoT queries, we build Pluto, a new stream processing system that optimizes both query submission and execution layer for efficiently handling many small IoT stream queries. By decoupling IoT query submission and its code registration and offering new APIs, Pluto mitigates the bottleneck in query submission and enables efficient resource sharing across small IoT stream queries in the execution. Second, to quickly handle sudden bursty loads and scale out big IoT stream queries, we build Sponge, which is a new stream system that optimizes query compilation, execution, and resource acquisition layer altogether. For fast acquisition of new resources, Sponge uses a new cloud computing service, called Lambda, because it offers fast-to-start lightweight containers. Sponge then converts the streaming dataflow of big stream queries to overcome Lambda’s resource constraint and to minimize scaling overheads at runtime. Our evaluations show that the end-to-end optimization techniques significantly improve system throughput and latency compared to existing stream systems in handling a large number of small IoT stream queries and in handling bursty loads of big IoT stream queries.다양한 IoT 디바이스로부터 많은 양의 데이터 스트림들이 생성되면서, 크게 두 가지 타입의 스트림 쿼리가 클라우드에서 수행된다. 첫째로는 작은-IoT 스트림 쿼리이며, 하나의 스트림 쿼리가 적은 양의 IoT 데이터 스트림을 처리하고 많은 수의 작은 스트림 쿼리들이 존재한다. 두번째로는 큰-IoT 스트림 쿼리이며, 하나 의 스트림 쿼리가 많은 양의, 급격히 증가하는 IoT 데이터 스트림들을 처리한다. 하지만, 기존 연구와 스트림 시스템에서는 쿼리 수행, 제출, 컴파일러, 및 리소스 확보 레이어가 이러한 워크로드에 최적화되어 있지 않아서 작은-IoT 및 큰-IoT 스트림 쿼리를 효율적으로 처리하지 못한다. 이 논문에서는 작은-IoT 및 큰-IoT 스트림 쿼리 워크로드를 최적화하기 위한 엔드-투-엔드 최적화 기법을 소개한다. 첫번째로, 많은 수의 작은-IoT 스트림 쿼 리를 처리하기 위해, 쿼리 제출과 수행 레이어를 최적화 하는 기법인 IoT 특성 기반 최적화를 수행한다. 쿼리 제출과 코드 등록을 분리하고, 이를 위한 새로운 API를 제공함으로써, 쿼리 제출에서의 오버헤드를 줄이고 쿼리 수행에서 IoT 특 성 기반으로 리소스를 공유함으로써 오버헤드를 줄인다. 두번째로, 큰-IoT 스트림 쿼리에서 급격히 증가하는 로드를 빠르게 처리하기 위해, 쿼리 컴파일러, 수행, 및 리소스 확보 레이어 최적화를 수행한다. 새로운 클라우드 컴퓨팅 리소스인 람다를 활용하여 빠르게 리소스를 확보하고, 람다의 제한된 리소스에서 스케일-아웃 오 버헤드를 줄이기 위해 스트림 데이터플로우를 바꿈으로써 큰-IoT 스트림 쿼리의 작업량을 빠르게 람다로 옮긴다. 최적화 기법의 효과를 보여주기 위해, 이 논문에서는 두가지 시스템-Pluto 와 Sponge-을 개발하였다. 실험을 통해서, 각 최적화 기법을 적용한 결과 기존 시스템 대비 처리량을 크게 향상시켰으며, 지연시간을 최소화하는 것을 확인하였다.Chapter 1 Introduction 1 1.1 IoT Stream Workloads 1 1.1.1 Small IoT Stream Query 2 1.1.2 Big IoT Stream Query 4 1.2 Proposed Solution 5 1.2.1 IoT-Aware Three-Phase Query Execution 6 1.2.2 Streaming Dataflow Reshaping on Lambda 7 1.3 Contribution 8 1.4 Dissertation Structure 9 Chapter 2 Background 10 2.1 Stream Query Model 10 2.2 Workload Characteristics 12 2.2.1 Small IoT Stream Query 12 2.2.2 Big IoT Stream Query 13 Chapter 3 IoT-Aware Three-Phase Query Execution 15 3.1 Pluto Design Overview 16 3.2 Decoupling of Code and Query Submission 19 3.2.1 Code Registration 19 3.2.2 Query Submission API 20 3.3 IoT-Aware Execution Model 21 3.3.1 Q-Group Creation and Query Grouping 24 3.3.2 Q-Group Assignment 24 3.3.3 Q-Group Scheduling and Processing 25 3.3.4 Load Rebalancing: Q-Group Split and Merging 28 3.4 Implementation 29 3.5 Evaluation 30 3.5.1 Methodology 30 3.5.2 Performance Comparison 34 3.5.3 Performance Breakdown 36 3.5.4 Load Rebalancing: Q-Group Split and Merging 38 3.5.5 Tradeoff 40 3.6 Discussion 41 3.7 Related Work 43 3.8 Summary 44 Chapter 4 Streaming Dataflow Reshaping for Fast Scaling Mechanism on Lambda 46 4.1 Motivation 46 4.2 Challenges 47 4.3 Design Overview 50 4.4 Reshaping Rules 51 4.4.1 R1:Inserting Router Operators 52 4.4.2 R2:Inserting Transient Operators 54 4.4.3 R3:Inserting State Merger Operators 57 4.5 Scaling Protocol 59 4.5.1 Redirection Protocol 59 4.5.2 Merging Protocol 60 4.5.3 Migration Protocol 61 4.6 Implementation 61 4.7 Evaluation 63 4.7.1 Methodology 63 4.7.2 Performance Analysis 68 4.7.3 Performance Breakdown 70 4.7.4 Latency-Cost($) Trade-Off 76 4.8 Discussion 77 4.9 Related Work 78 4.10 Summary 80 Chapter 5 Conclusion 81박

SNU Open Repository and Archive