Search CORE

43,453 research outputs found

클라우드 환경에서 빠르고 효율적인 IoT 스트림 처리를 위한 엔드-투-엔드 최적화

Author: 엄태건
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2021.8. 엄태건.As a large amount of data streams are generated from Internet of Things (IoT) devices, two types of IoT stream queries are deployed in the cloud. One is a small IoT-stream query, which continuously processes a few IoT data streams of end-users’s IoT devices that have low input rates (e.g., one event per second). The other one is a big IoT-stream query, which is deployed by data scientists to continuously process a large number and huge amount of aggregated data streams that can suddenly fluctuate in a short period of time (bursty loads). However, existing work and stream systems fall short of handling such workloads efficiently because their query submission, compilation, execution, and resource acquisition layer are not optimized for the workloads. This dissertation proposes two end-to-end optimization techniques— not only optimizing stream query execution layer (runtime), but also optimizing query submission, compiler, or resource acquisition layer. First, to minimize the number of cloud machines and maintenance cost of servers in processing many small IoT queries, we build Pluto, a new stream processing system that optimizes both query submission and execution layer for efficiently handling many small IoT stream queries. By decoupling IoT query submission and its code registration and offering new APIs, Pluto mitigates the bottleneck in query submission and enables efficient resource sharing across small IoT stream queries in the execution. Second, to quickly handle sudden bursty loads and scale out big IoT stream queries, we build Sponge, which is a new stream system that optimizes query compilation, execution, and resource acquisition layer altogether. For fast acquisition of new resources, Sponge uses a new cloud computing service, called Lambda, because it offers fast-to-start lightweight containers. Sponge then converts the streaming dataflow of big stream queries to overcome Lambda’s resource constraint and to minimize scaling overheads at runtime. Our evaluations show that the end-to-end optimization techniques significantly improve system throughput and latency compared to existing stream systems in handling a large number of small IoT stream queries and in handling bursty loads of big IoT stream queries.다양한 IoT 디바이스로부터 많은 양의 데이터 스트림들이 생성되면서, 크게 두 가지 타입의 스트림 쿼리가 클라우드에서 수행된다. 첫째로는 작은-IoT 스트림 쿼리이며, 하나의 스트림 쿼리가 적은 양의 IoT 데이터 스트림을 처리하고 많은 수의 작은 스트림 쿼리들이 존재한다. 두번째로는 큰-IoT 스트림 쿼리이며, 하나 의 스트림 쿼리가 많은 양의, 급격히 증가하는 IoT 데이터 스트림들을 처리한다. 하지만, 기존 연구와 스트림 시스템에서는 쿼리 수행, 제출, 컴파일러, 및 리소스 확보 레이어가 이러한 워크로드에 최적화되어 있지 않아서 작은-IoT 및 큰-IoT 스트림 쿼리를 효율적으로 처리하지 못한다. 이 논문에서는 작은-IoT 및 큰-IoT 스트림 쿼리 워크로드를 최적화하기 위한 엔드-투-엔드 최적화 기법을 소개한다. 첫번째로, 많은 수의 작은-IoT 스트림 쿼 리를 처리하기 위해, 쿼리 제출과 수행 레이어를 최적화 하는 기법인 IoT 특성 기반 최적화를 수행한다. 쿼리 제출과 코드 등록을 분리하고, 이를 위한 새로운 API를 제공함으로써, 쿼리 제출에서의 오버헤드를 줄이고 쿼리 수행에서 IoT 특 성 기반으로 리소스를 공유함으로써 오버헤드를 줄인다. 두번째로, 큰-IoT 스트림 쿼리에서 급격히 증가하는 로드를 빠르게 처리하기 위해, 쿼리 컴파일러, 수행, 및 리소스 확보 레이어 최적화를 수행한다. 새로운 클라우드 컴퓨팅 리소스인 람다를 활용하여 빠르게 리소스를 확보하고, 람다의 제한된 리소스에서 스케일-아웃 오 버헤드를 줄이기 위해 스트림 데이터플로우를 바꿈으로써 큰-IoT 스트림 쿼리의 작업량을 빠르게 람다로 옮긴다. 최적화 기법의 효과를 보여주기 위해, 이 논문에서는 두가지 시스템-Pluto 와 Sponge-을 개발하였다. 실험을 통해서, 각 최적화 기법을 적용한 결과 기존 시스템 대비 처리량을 크게 향상시켰으며, 지연시간을 최소화하는 것을 확인하였다.Chapter 1 Introduction 1 1.1 IoT Stream Workloads 1 1.1.1 Small IoT Stream Query 2 1.1.2 Big IoT Stream Query 4 1.2 Proposed Solution 5 1.2.1 IoT-Aware Three-Phase Query Execution 6 1.2.2 Streaming Dataflow Reshaping on Lambda 7 1.3 Contribution 8 1.4 Dissertation Structure 9 Chapter 2 Background 10 2.1 Stream Query Model 10 2.2 Workload Characteristics 12 2.2.1 Small IoT Stream Query 12 2.2.2 Big IoT Stream Query 13 Chapter 3 IoT-Aware Three-Phase Query Execution 15 3.1 Pluto Design Overview 16 3.2 Decoupling of Code and Query Submission 19 3.2.1 Code Registration 19 3.2.2 Query Submission API 20 3.3 IoT-Aware Execution Model 21 3.3.1 Q-Group Creation and Query Grouping 24 3.3.2 Q-Group Assignment 24 3.3.3 Q-Group Scheduling and Processing 25 3.3.4 Load Rebalancing: Q-Group Split and Merging 28 3.4 Implementation 29 3.5 Evaluation 30 3.5.1 Methodology 30 3.5.2 Performance Comparison 34 3.5.3 Performance Breakdown 36 3.5.4 Load Rebalancing: Q-Group Split and Merging 38 3.5.5 Tradeoff 40 3.6 Discussion 41 3.7 Related Work 43 3.8 Summary 44 Chapter 4 Streaming Dataflow Reshaping for Fast Scaling Mechanism on Lambda 46 4.1 Motivation 46 4.2 Challenges 47 4.3 Design Overview 50 4.4 Reshaping Rules 51 4.4.1 R1:Inserting Router Operators 52 4.4.2 R2:Inserting Transient Operators 54 4.4.3 R3:Inserting State Merger Operators 57 4.5 Scaling Protocol 59 4.5.1 Redirection Protocol 59 4.5.2 Merging Protocol 60 4.5.3 Migration Protocol 61 4.6 Implementation 61 4.7 Evaluation 63 4.7.1 Methodology 63 4.7.2 Performance Analysis 68 4.7.3 Performance Breakdown 70 4.7.4 Latency-Cost($) Trade-Off 76 4.8 Discussion 77 4.9 Related Work 78 4.10 Summary 80 Chapter 5 Conclusion 81박

Continuous Learning of HPC Infrastructure Models using Big Data Analytics and In-Memory processing Tools

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/12/2017
Field of study

open4siThis work was supported, in parts, by the FP7 ERC Advance project MULTITHERMAN (g.a. 291125), by the EU H2020 FETHPC project ANTAREX (g.a. 67623) and by the EU H2020 FETHPC project Exanode (g.a. 671578).Exascale computing represents the next leap in the HPC race. Reaching this level of performance is subject to several engineering challenges such as energy consumption, equipment-cooling, reliability and massive parallelism. Model-based optimization is an essential tool in the design process and control of energy efficient, reliable and thermally constrained systems. However, in the Exascale domain, model learning techniques tailored to the specific supercomputer require real measurements and must therefore handle and analyze a massive amount of data coming from the HPC monitoring infrastructure. This becomes rapidly a 'big data' scale problem. The common approach where measurements are first stored in large databases and then processed is no more affordable due to the increasingly storage costs and lack of real-time support. Nowadays instead, cloud-based machine learning techniques aim to build on-line models using real-time approaches such as 'stream processing' and 'in-memory' computing, that avoid storage costs and enable fastdata processing. Moreover, the fast delivery and adaptation of the models to the quick data variations, make the decision stage of the optimization loop more effective and reliable. In this paper we leverage scalable, lightweight and flexible IoT technologies, such as the MQTT protocol, to build a highly scalable HPC monitoring infrastructure able to handle the massive sensor data produced by next-gen HPC components. We then show how state-of-the art tools for big data computing and analysis, such as Apache Spark, can be used to manage the huge amount of data delivered by the monitoring layer and to build adaptive models in real-time using on-line machine learning techniques.openBeneventi, Francesco; Bartolini, Andrea; Cavazzoni, Carlo; Benini, LucaBeneventi, Francesco; Bartolini, Andrea; Cavazzoni, Carlo; Benini, Luc

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams

Author: Prasanna Viktor
Simmhan Yogesh
Zhou Qunzhi
Publication venue: 'Elsevier BV'
Publication date: 02/11/2016
Field of study

Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems (CPS) present novel challenges to Big Data platforms for performing online analytics. Ubiquitous sensors from IoT deployments are able to generate data streams at high velocity, that include information from a variety of domains, and accumulate to large volumes on disk. Complex Event Processing (CEP) is recognized as an important real-time computing paradigm for analyzing continuous data streams. However, existing work on CEP is largely limited to relational query processing, exposing two distinctive gaps for query specification and execution: (1) infusing the relational query model with higher level knowledge semantics, and (2) seamless query evaluation across temporal spaces that span past, present and future events. These allow accessible analytics over data streams having properties from different disciplines, and help span the velocity (real-time) and volume (persistent) dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP) framework that provides domain-aware knowledge query constructs along with temporal operators that allow end-to-end queries to span across real-time and persistent streams. We translate this query model to efficient query execution over online and offline data streams, proposing several optimizations to mitigate the overheads introduced by evaluating semantic predicates and in accessing high-volume historic data streams. The proposed X-CEP query model and execution approaches are implemented in our prototype semantic CEP engine, SCEPter. We validate our query model using domain-aware CEP queries from a real-world Smart Power Grid application, and experimentally analyze the benefits of our optimizations for executing these queries, using event streams from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems, October 27, 201

arXiv.org e-Print Archive