Search CORE

44 research outputs found

High availability of data using Automatic Selection Algorithm (ASA) in distributed stream processing systems

Author: Alhumyani Hesham
Alshamrani Sultan
Mohamed Isbudeen Noor
Waseem Quadri
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/06/2019
Field of study

High Availability of data is one of the most critical requirements of a distributed stream processing systems (DSPS). We can achieve high availability using available recovering techniques, which include (active backup, passive backup and upstream backup). Each recovery technique has its own advantages and disadvantages. They are used for different type of failures based on the type and the nature of the failures. This paper presents an Automatic Selection Algorithm (ASA) which will help in selecting the best recovery techniques based on the type of failures. We intend to use together all different recovery approaches available (i.e., active standby, passive standby, and upstream standby) at nodes in a distributed stream-processing system (DSPS) based upon the system requirements and a failure type). By doing this, we will achieve all benefits of fastest recovery, precise recovery and a lower runtime overhead in a single solution. We evaluate our automatic selection algorithm (ASA) approach as an algorithm selector during the runtime of stream processing. Moreover, we also evaluated its efficiency in comparison with the time factor. The experimental results show that our approach is 95% efficient and fast than other conventional manual failure recovery approaches and is hence totally automatic in nature

Bulletin of Electrical Engineering and Informatics

S-Store: Streaming Meets Transaction Processing

Author: Aslantas Cansu
Cetintemel Ugur
Du Jiang
Kraska Tim
Madden Samuel
Maier David
Meehan John
Pavlo Andrew
Stonebraker Michael
Tatbul Nesime
Tufte Kristin
Wang Hao
Zdonik Stan
Publication venue
Publication date: 01/01/2015
Field of study

Stream processing addresses the needs of real-time applications. Transaction processing addresses the coordination and safety of short atomic computations. Heretofore, these two modes of operation existed in separate, stove-piped systems. In this work, we attempt to fuse the two computational paradigms in a single system called S-Store. In this way, S-Store can simultaneously accommodate OLTP and streaming applications. We present a simple transaction model for streams that integrates seamlessly with a traditional OLTP system. We chose to build S-Store as an extension of H-Store, an open-source, in-memory, distributed OLTP database system. By implementing S-Store in this way, we can make use of the transaction processing facilities that H-Store already supports, and we can concentrate on the additional implementation features that are needed to support streaming. Similar implementations could be done using other main-memory OLTP platforms. We show that we can actually achieve higher throughput for streaming workloads in S-Store than an equivalent deployment in H-Store alone. We also show how this can be achieved within H-Store with the addition of a modest amount of new functionality. Furthermore, we compare S-Store to two state-of-the-art streaming systems, Spark Streaming and Storm, and show how S-Store matches and sometimes exceeds their performance while providing stronger transactional guarantees

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

PDXScholar (Portland State University)

Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine

Author: Cao Jianneng
Madsen Kasper Grud Skat
Zhou Yongluan
Publication venue
Publication date: 11/02/2016
Field of study

Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches

arXiv.org e-Print Archive

Crossref

University of Southern Denmark Research Output

Tolerating Correlated Failures in Massively Parallel Stream Processing Engines

Author: Su Li
Zhou Yongluan
Publication venue
Publication date: 04/02/2016
Field of study

Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE). The passive approach incurs a long recovery latency especially when a number of correlated nodes fail simultaneously, while the active approach requires extra replication resources. In this paper, we propose a new fault-tolerance framework, which is Passive and Partially Active (PPA). In a PPA scheme, the passive approach is applied to all tasks while only a selected set of tasks will be actively replicated. The number of actively replicated tasks depends on the available resources. If tasks without active replicas fail, tentative outputs will be generated before the completion of the recovery process. We also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness of our approach

arXiv.org e-Print Archive

University of Southern Denmark Research Output

Fault Tolerant Resource Allocation for Query Processing in Grid Environments

Author: Cokuslu Deniz
Erciyes Kayhan
Hameurlain Abdelkader
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2015
Field of study

International audienceIn this paper, we propose a new algorithm for fault-tolerant resource allocation for query processing in grid environments. For this, we propose an initial resource allocation algorithm followed by a fault-tolerance protocol. The proposed fault-tolerance protocol is based on the passive replication of stateful operators in queries. We provide theoretical analyses of the proposed algorithms and consolidate our analyses with the simulations

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Recovery From Node Failure in Distributed Query Processing

Author: Taylor Nicholas E.
Publication venue: ScholarlyCommons
Publication date: 12/11/2008
Field of study

While distributed query processing has many advantages, the use of many independent, physically widespread computers almost universally leads to reliability issues. Several techniques have been developed to provide redundancy and the ability to recover from node failure during query processing. In this survey, we examine three techniques--upstream backup, active standby, and passive standby--that have been used in both distributed stream data processing and the distributed processing of static data. We also compare several recent systems that use these techniques, and explore which recovery techniques work well under various conditions

ScholarlyCommons@Penn

Fault-Tolerance Implementation in Typical Distributed Stream Processing Systems

Author: JICHIANG TSA
WUHONG CHEN
Publication venue
Publication date: 03/08/2015
Field of study

National Chung Hsing University Institutional Repository