2 research outputs found

    Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster

    Full text link
    The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window stream joins are among the most important operators in a stream processing system. In this paper, we consider the issue of parallelizing a sliding window stream join operator over a shared nothing cluster. We propose a framework, based on fixed or predefined communication pattern, to distribute the join processing loads over the shared-nothing cluster. We consider various overheads while scaling over a large number of nodes, and propose solution methodologies to cope with the issues. We implement the algorithm over a cluster using a message passing system, and present the experimental results showing the effectiveness of the join processing algorithm.Comment: 11 page

    Adaptive load diffusion for stream joins

    No full text
    Abstract. Data stream processing has become increasingly important as many emerging applications call for sophisticated realtime processing over data streams, such as stock trading surveillance, network traffic monitoring, and sensor data analysis. Stream joins are among the most important stream processing operations, which can be used to detect linkages and correlations between different data streams. One major challenge in processing stream joins is to handle continuous, high-volume, and time-varying data streams under resource constraints. In this paper, we present a novel load diffusion system to enable scalable execution of resource-intensive stream joins using an ensemble of server hosts. The load diffusion is achieved by a simple correlation-aware stream partition algorithm. Different from previous work, the load diffusion system can (1) achieve fine-grained load sharing in the distributed stream processing system; and (2) produce exact query answers without missing any join results or generate duplicate join results. Our experimental results show that the load diffusion scheme can greatly improve the system throughput and achieve more balanced load distribution.
    corecore