1 research outputs found

    AN OPTIMAL EVALUATION OF ”GROUPBY-JOIN ” QUERIES IN DISTRIBUTED ARCHITECTURES

    No full text
    SQL queries involving join and group-by operations are fairly common in many decision support applications where the size of the input relations is usually very large, so the parallelization of these queries is highly recommended in order to obtain a desirable response time. Several parallel algorithms that treat this kind of queries have been presented in the literature. However, their most significant drawbacks are that they are very sensitive to data skew and involve expansive communication and Input/Output costs in the evaluation of the join operation. In this paper, we present an algorithm that overcomes these drawbacks because it evaluates the ”GroupBy-Join ” query without the need of the direct evaluation of the costly join operation, thus reducing its Input/Output and communication costs. Furthermore, the performance of this algorithm is analyzed using the scalable and portable BSP (Bulk Synchronous Parallel) cost model which predicts a linear speedup even for highly skewed data.
    corecore