30,815 research outputs found

    Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster

    Full text link
    The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window stream joins are among the most important operators in a stream processing system. In this paper, we consider the issue of parallelizing a sliding window stream join operator over a shared nothing cluster. We propose a framework, based on fixed or predefined communication pattern, to distribute the join processing loads over the shared-nothing cluster. We consider various overheads while scaling over a large number of nodes, and propose solution methodologies to cope with the issues. We implement the algorithm over a cluster using a message passing system, and present the experimental results showing the effectiveness of the join processing algorithm.Comment: 11 page

    Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning

    Get PDF
    Fine tuning distributed systems is considered to be a craftsmanship, relying on intuition and experience. This becomes even more challenging when the systems need to react in near real time, as streaming engines have to do to maintain pre-agreed service quality metrics. In this article, we present an automated approach that builds on a combination of supervised and reinforcement learning methods to recommend the most appropriate lever configurations based on previous load. With this, streaming engines can be automatically tuned without requiring a human to determine the right way and proper time to deploy them. This opens the door to new configurations that are not being applied today since the complexity of managing these systems has surpassed the abilities of human experts. We show how reinforcement learning systems can find substantially better configurations in less time than their human counterparts and adapt to changing workloads

    Self-* overload control for distributed web systems

    Full text link
    Unexpected increases in demand and most of all flash crowds are considered the bane of every web application as they may cause intolerable delays or even service unavailability. Proper quality of service policies must guarantee rapid reactivity and responsiveness even in such critical situations. Previous solutions fail to meet common performance requirements when the system has to face sudden and unpredictable surges of traffic. Indeed they often rely on a proper setting of key parameters which requires laborious manual tuning, preventing a fast adaptation of the control policies. We contribute an original Self-* Overload Control (SOC) policy. This allows the system to self-configure a dynamic constraint on the rate of admitted sessions in order to respect service level agreements and maximize the resource utilization at the same time. Our policy does not require any prior information on the incoming traffic or manual configuration of key parameters. We ran extensive simulations under a wide range of operating conditions, showing that SOC rapidly adapts to time varying traffic and self-optimizes the resource utilization. It admits as many new sessions as possible in observance of the agreements, even under intense workload variations. We compared our algorithm to previously proposed approaches highlighting a more stable behavior and a better performance.Comment: The full version of this paper, titled "Self-* through self-learning: overload control for distributed web systems", has been published on Computer Networks, Elsevier. The simulator used for the evaluation of the proposed algorithm is available for download at the address: http://www.dsi.uniroma1.it/~novella/qos_web

    Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores

    Get PDF
    Modern business applications and scientific databases call for inherently dynamic data storage environments. Such environments are characterized by two challenging features: (a) they have little idle system time to devote on physical design; and (b) there is little, if any, a priori workload knowledge, while the query and data workload keeps changing dynamically. In such environments, traditional approaches to index building and maintenance cannot apply. Database cracking has been proposed as a solution that allows on-the-fly physical data reorganization, as a collateral effect of query processing. Cracking aims to continuously and automatically adapt indexes to the workload at hand, without human intervention. Indexes are built incrementally, adaptively, and on demand. Nevertheless, as we show, existing adaptive indexing methods fail to deliver workload-robustness; they perform much better with random workloads than with others. This frailty derives from the inelasticity with which these approaches interpret each query as a hint on how data should be stored. Current cracking schemes blindly reorganize the data within each query's range, even if that results into successive expensive operations with minimal indexing benefit. In this paper, we introduce stochastic cracking, a significantly more resilient approach to adaptive indexing. Stochastic cracking also uses each query as a hint on how to reorganize data, but not blindly so; it gains resilience and avoids performance bottlenecks by deliberately applying certain arbitrary choices in its decision-making. Thereby, we bring adaptive indexing forward to a mature formulation that confers the workload-robustness previous approaches lacked. Our extensive experimental study verifies that stochastic cracking maintains the desired properties of original database cracking while at the same time it performs well with diverse realistic workloads.Comment: VLDB201
    corecore