135 research outputs found

    Scalable and fault-tolerant data stream processing on multi-core architectures

    Get PDF
    With increasing data volumes and velocity, many applications are shifting from the classical “process-after-store” paradigm to a stream processing model: data is produced and consumed as continuous streams. Stream processing captures latency-sensitive applications as diverse as credit card fraud detection and high-frequency trading. These applications are expressed as queries of algebraic operations (e.g., aggregation) over the most recent data using windows, i.e., finite evolving views over the input streams. To guarantee correct results, streaming applications require precise window semantics (e.g., temporal ordering) for operations that maintain state. While high processing throughput and low latency are performance desiderata for stateful streaming applications, achieving both poses challenges. Computing the state of overlapping windows causes redundant aggregation operations: incremental execution (i.e., reusing previous results) reduces latency but prevents parallelization; at the same time, parallelizing window execution for stateful operations with precise semantics demands ordering guarantees and state access coordination. Finally, streams and state must be recovered to produce consistent and repeatable results in the event of failures. Given the rise of shared-memory multi-core CPU architectures and high-speed networking, we argue that it is possible to address these challenges in a single node without compromising window semantics, performance, or fault-tolerance. In this thesis, we analyze, design, and implement stream processing engines (SPEs) that achieve high performance on multi-core architectures. To this end, we introduce new approaches for in-memory processing that address the previous challenges: (i) for overlapping windows, we provide a family of window aggregation techniques that enable computation sharing based on the algebraic properties of aggregation functions; (ii) for parallel window execution, we balance parallelism and incremental execution by developing abstractions for both and combining them to a novel design; and (iii) for reliable single-node execution, we enable strong fault-tolerance guarantees without sacrificing performance by reducing the required disk I/O bandwidth using a novel persistence model. We combine the above to implement an SPE that processes hundreds of millions of tuples per second with sub-second latencies. These results reveal the opportunity to reduce resource and maintenance footprint by replacing cluster-based SPEs with single-node deployments.Open Acces

    Distributed Processing and Analytics of IoT data in Edge Cloud

    Get PDF
    Sensors of different kinds connect to the IoT network and generate a large number of data streams. We explore the possibility of performing stream processing at the network edge and an architecture to do so. This thesis work is based on a prototype solution developed by Nokia. The system operates close to the data sources and retrieves the data based on requests made by applications through the system. Processing the data close to the place where it is generated can save bandwidth and assist in decision making. This work proposes a processing component operating at the far edge. The applicability of the prototype solution given the proposed processing component was illustrated in three use cases. Those use cases involve analysis performed on values of Key Performance Indicators, data streams generated by air quality sensors called Sensordrones, and recognizing car license plates by an application of deep learning

    SPL: An extensible language for distributed stream processing

    Get PDF
    Big data is revolutionizing how all sectors of our economy do business, including telecommunication, transportation, medical, and finance. Big data comes in two flavors: data at rest and data in motion. Processing data in motion is stream processing. Stream processing for big data analytics often requires scale that can only be delivered by a distributed system, exploiting parallelism on many hosts and many cores. One such distributed stream processing system is IBM Streams. Early customer experience with IBM Streams uncovered that another core requirement is extensibility, since customers want to build high-performance domain-specific operators for use in their streaming applications. Based on these two core requirements of distribution and extensibility, we designed and implemented the Streams Processing Language (SPL). This article describes SPL with an emphasis on the language design, distributed runtime, and extensibility mechanism. SPL is now the gateway for the IBM Streams platform, used by our customers for stream processing in a broad range of application domains. © 2017 ACM

    Robust Complex Event Pattern Detection over Streams

    Get PDF
    Event stream processing (ESP) has become increasingly important in modern applications. In this dissertation, I focus on providing a robust ESP solution by meeting three major research challenges regarding the robustness of ESP systems: (1) while event constraint of the input stream is available, applying such semantic information in the event processing; (2) handling event streams with out-of-order data arrival and (3) handling event streams with interval-based temporal semantics. The following are the three corresponding research tasks completed by the dissertation: Task I - Constraint-Aware Complex Event Pattern Detection over Streams. In this task, a framework for constraint-aware pattern detection over event streams is designed, which on the fly checks the query satisfiability / unsatisfiability using a lightweight reasoning mechanism and adjusts the processing strategy dynamically by producing early feedback, releasing unnecessary system resources and terminating corresponding pattern monitor. Task II - Complex Event Pattern Detection over Streams with Out-of-Order Data Arrival. In this task, a mechanism to address the problem of processing event queries specified over streams that may contain out-of-order data is studied, which provides new physical implementation strategies for the core stream algebra operators such as sequence scan, pattern construction and negation filtering. Task III - Complex Event Pattern Detection over Streams with Interval-Based Temporal Semantics. In this task, an expressive language to represent the required temporal patterns among streaming interval events is introduced and the corresponding temporal operator ISEQ is designed

    Development of a supervisory internet of things (IoT) system for factories of the future

    Full text link
    Big data is of great importance to stakeholders, including manufacturers, business partners, consumers, government. It leads to many benefits, including improving productivity and reducing the cost of products by using digitalised automation equipment and manufacturing information systems. Some other benefits include using social media to build the agile cooperation between suppliers and retailers, product designers and production engineers, timely tracking customers’ feedbacks, reducing environmental impacts by using Internet of Things (IoT) sensors to monitor energy consumption and noise level. However, manufacturing big data integration has been neglected. Many open-source big data software provides complicated capabilities to manage big data software for various data-driven applications for manufacturing. In this research, a manufacturing big data integration system, named as Data Control Module (DCM) has been designed and developed. The system can securely integrate data silos from various manufacturing systems and control the data for different manufacturing applications. Firstly, the architecture of manufacturing big data system has been proposed, including three parts: manufacturing data source, manufacturing big data ecosystem and manufacturing applications. Secondly, nine essential components have been identified in the big data ecosystem to build various manufacturing big data solutions. Thirdly, a conceptual framework is proposed based on the big data ecosystem for the aim of DCM. Moreover, the DCM has been designed and developed with the selected big data software to integrate all the three varieties of manufacturing data, including non-structured, semi-structured and structured. The DCM has been validated on three general manufacturing domains, including product design and development, production and business. The DCM cannot only be used for the legacy manufacturing software but may also be used in emerging areas such as digital twin and digital thread. The limitations of DCM have been analysed, and further research directions have also been discussed

    Programming Languages and Systems

    Get PDF
    This open access book constitutes the proceedings of the 29th European Symposium on Programming, ESOP 2020, which was planned to take place in Dublin, Ireland, in April 2020, as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020. The actual ETAPS 2020 meeting was postponed due to the Corona pandemic. The papers deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems

    Computer Aided Verification

    Get PDF
    This open access two-volume set LNCS 11561 and 11562 constitutes the refereed proceedings of the 31st International Conference on Computer Aided Verification, CAV 2019, held in New York City, USA, in July 2019. The 52 full papers presented together with 13 tool papers and 2 case studies, were carefully reviewed and selected from 258 submissions. The papers were organized in the following topical sections: Part I: automata and timed systems; security and hyperproperties; synthesis; model checking; cyber-physical systems and machine learning; probabilistic systems, runtime techniques; dynamical, hybrid, and reactive systems; Part II: logics, decision procedures; and solvers; numerical programs; verification; distributed systems and networks; verification and invariants; and concurrency

    Estimating the win probability in a hockey game

    Get PDF
    When a hockey game is being played, its data comes continuously. Therefore, it is possible to use the stream mining method to estimate the win probability (WP) of a team once the game begins. Based on 8 seasons’ data of NHL from 2003-2014, we provide three methods to estimate the win probability in a hockey game. Win probability calculation method based on statistics is the first model, which is built based on the summary of the historical data. Win probability calculation method based on data mining classification technique is the second model. In this model, we implemented some data classification algorithms on our data and compared the results, then chose the best algorithm to build the win probability model. Naive Bayes, SVM, VFDT, and Random Tree data classification methods have been compared in this thesis on the hockey dataset. We used stream mining technique in our last model, which is a real time prediction model, which can be interpreted as a trainingupdate- training model. Every 20 events in a hockey game are split as a window. We use the last window as the training data set to get decision tree rules used for classifying the current window. Then a parameter can be calculated by the rules trained by these two windows. This parameter can tell us which rule is better than another to train the next window. In our models the variables time, leadsize, number of shots, number of misses, number of penalties are combined to calculate the win probability. Our WP estimates can provide useful evaluations of plays, prediction of game result and in some cases, guidance for coach decisions.Master of Science (M.Sc.) in Computational Science
    • …
    corecore