13 research outputs found
An Efficient Equi-semi-join Algorithm for Distributed Architectures
International audienc
A skew-insensitive algorithm for join and multi-join operations on Shared Nothing machines
Join is an expensive and frequently used operation whose parallelization is highly desirable. However effectiveness of parallel joins depends on the ability to evenly divide load among processors. Data skew can have a disastrous effect on performance. Although many skew-handling algorithms have been proposed they remain generally inefficient in the case of multi-joins due to join product skew, costly and unnecessary redistribution and communication costs. A parallel join algorithm called fa join has been introduced in an earlier paper with deterministic and near-perfect balancing properties. Despite its advantages, fa join is sensitive to the correlation of the attribute value distributions in both relations. We present here an improved version of the algorithm called Sfa join with a symmetric treatment of both relations. Its predictably low join-product and attribute-value skew makes it suitable for repeated use in multi-join operations. Its performance is analyzed theoretically and expe..