30,815 research outputs found
Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster
The availability of large number of processing nodes in a parallel and
distributed computing environment enables sophisticated real time processing
over high speed data streams, as required by many emerging applications.
Sliding window stream joins are among the most important operators in a stream
processing system. In this paper, we consider the issue of parallelizing a
sliding window stream join operator over a shared nothing cluster. We propose a
framework, based on fixed or predefined communication pattern, to distribute
the join processing loads over the shared-nothing cluster. We consider various
overheads while scaling over a large number of nodes, and propose solution
methodologies to cope with the issues. We implement the algorithm over a
cluster using a message passing system, and present the experimental results
showing the effectiveness of the join processing algorithm.Comment: 11 page
Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning
Fine tuning distributed systems is considered to be a craftsmanship, relying
on intuition and experience. This becomes even more challenging when the
systems need to react in near real time, as streaming engines have to do to
maintain pre-agreed service quality metrics. In this article, we present an
automated approach that builds on a combination of supervised and reinforcement
learning methods to recommend the most appropriate lever configurations based
on previous load. With this, streaming engines can be automatically tuned
without requiring a human to determine the right way and proper time to deploy
them. This opens the door to new configurations that are not being applied
today since the complexity of managing these systems has surpassed the
abilities of human experts. We show how reinforcement learning systems can find
substantially better configurations in less time than their human counterparts
and adapt to changing workloads
Self-* overload control for distributed web systems
Unexpected increases in demand and most of all flash crowds are considered
the bane of every web application as they may cause intolerable delays or even
service unavailability. Proper quality of service policies must guarantee rapid
reactivity and responsiveness even in such critical situations. Previous
solutions fail to meet common performance requirements when the system has to
face sudden and unpredictable surges of traffic. Indeed they often rely on a
proper setting of key parameters which requires laborious manual tuning,
preventing a fast adaptation of the control policies. We contribute an original
Self-* Overload Control (SOC) policy. This allows the system to self-configure
a dynamic constraint on the rate of admitted sessions in order to respect
service level agreements and maximize the resource utilization at the same
time. Our policy does not require any prior information on the incoming traffic
or manual configuration of key parameters. We ran extensive simulations under a
wide range of operating conditions, showing that SOC rapidly adapts to time
varying traffic and self-optimizes the resource utilization. It admits as many
new sessions as possible in observance of the agreements, even under intense
workload variations. We compared our algorithm to previously proposed
approaches highlighting a more stable behavior and a better performance.Comment: The full version of this paper, titled "Self-* through self-learning:
overload control for distributed web systems", has been published on Computer
Networks, Elsevier. The simulator used for the evaluation of the proposed
algorithm is available for download at the address:
http://www.dsi.uniroma1.it/~novella/qos_web
Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores
Modern business applications and scientific databases call for inherently
dynamic data storage environments. Such environments are characterized by two
challenging features: (a) they have little idle system time to devote on
physical design; and (b) there is little, if any, a priori workload knowledge,
while the query and data workload keeps changing dynamically. In such
environments, traditional approaches to index building and maintenance cannot
apply. Database cracking has been proposed as a solution that allows on-the-fly
physical data reorganization, as a collateral effect of query processing.
Cracking aims to continuously and automatically adapt indexes to the workload
at hand, without human intervention. Indexes are built incrementally,
adaptively, and on demand. Nevertheless, as we show, existing adaptive indexing
methods fail to deliver workload-robustness; they perform much better with
random workloads than with others. This frailty derives from the inelasticity
with which these approaches interpret each query as a hint on how data should
be stored. Current cracking schemes blindly reorganize the data within each
query's range, even if that results into successive expensive operations with
minimal indexing benefit. In this paper, we introduce stochastic cracking, a
significantly more resilient approach to adaptive indexing. Stochastic cracking
also uses each query as a hint on how to reorganize data, but not blindly so;
it gains resilience and avoids performance bottlenecks by deliberately applying
certain arbitrary choices in its decision-making. Thereby, we bring adaptive
indexing forward to a mature formulation that confers the workload-robustness
previous approaches lacked. Our extensive experimental study verifies that
stochastic cracking maintains the desired properties of original database
cracking while at the same time it performs well with diverse realistic
workloads.Comment: VLDB201
- …