5 research outputs found
Efficient Processing of Continuous Join Queries using Distributed Hash Tables
International audienceThis paper addresses the problem of computing approximate answers to continuous join queries. We present a new method, called DHTJoin, which combines hash-based placement of tuples in a Distributed Hash Table (DHT) and dissemination of queries using a gossip style protocol. We provide a performance evaluation of DHTJoin which shows that DHTJoin can achieve significant performance gains in terms of network traffic
Continuous Multi-Way Joins over Distributed Hash Tables
This paper studies the problem of evaluating continuous multi-way
joins on top of Distributed Hash Tables (DHTs). We present a novel
algorithm, called recursive join (RJoin), that takes into account
various parameters crucial in a distributed setting i.e., network
traffic, query processing load distribution, storage load
distribution etc. The key idea of RJoin is incremental evaluation:
as relevant tuples arrive continuously, a given multi-way join is
rewritten continuously into a join with fewer join operators, and is
assigned continuously to different nodes of the network. In this
way, RJoin distributes the responsibility of evaluating a continuous
multi-way join to many network nodes by assigning parts of the
evaluation of each binary join to a different node depending on the
values of the join attributes. The actual nodes to be involved are
decided by RJoin dynamically after taking into account the rate of
incoming tuples with values equal to the values of the joined
attributes. RJoin also supports sliding window joins which is a
crucial feature, especially for long join paths, since it provides a
mechanism to reduce the query processing state and thus keep the
cost of handling incoming tuples stable. In addition, RJoin is able
to handle message delays due to heavy network traffic. We present a
detailed mathematical and experimental analysis of RJoin and study
the performance tradeoffs that occur
DHTJoin: Processing Continuous Join Queries Using DHT Networks
International audienceContinuous query processing in data stream management systems (DSMS) has received considerable attention recently. Many applications share the same need for processing data streams in a continuous fashion. For most distributed streaming applications, the centralized processing of continuous queries over distributed data is simply not viable. This paper addresses the problem of computing approximate answers to continuous join queries over distributed data streams. We present a new method, called DHTJoin, which combines hash-based placement of tuples in a Distributed Hash Table (DHT) and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incuring little overhead. DHTJoin also deals with join attribute value skew which may hurt load balancing and result completeness. We provide a performance evaluation of DHTJoin which shows that it can achieve significant performance gains in terms of network traffic
Επεξεργασία συνεχών επερωτήσεων σε σημασιολογικά δεδομένα πάνω από κατανεμημένους πίνακες κατακερματισμού
Στην παρούσα εργασία παρουσιάζω την υλοποίηση ενός συστήματος επεξεργασίας
συνεχών επερωτήσεων σε σημασιολογικά δεδομένα πάνω από κατανεμημένους πίνακες
κατακερματισμού. Το σύστημα εκτελείται πάνω από κατανεμημένο δίκτυο, κι
υλοποιεί τον αλγόριθμο ευρετηρίασης των εισερχόμενων επερωτήσεων σε κόμβους
Rjoin, ενώ χρησιμοποιεί και τους αλγορίθμους CQC και CSBV για σύγκριση σε
ταχύτητα και πλήθος μηνυμάτων. Το σύστημα κατανεμημένων πινάκων κατακερματισμού
Pastry παρέχει τις λειτουργίες επικοινωνίας και μεταφοράς δεδομένων μεταξύ των
κόμβων.In this work I present the implementation of a system for processing continuous
queries on semantic data over distributed hash tables. The system is executed
on a distributed network, and implements the RJoin algorithm for indexing
incoming queries to nodes while also using the CQC and CSBV algorithms for
speed and communication costs comparison. Communication and data transfer
services are supplied by the distributed hash table system Pastry