20,477 research outputs found

    Fast Parallel Operations on Search Trees

    Full text link
    Using (a,b)-trees as an example, we show how to perform a parallel split with logarithmic latency and parallel join, bulk updates, intersection, union (or merge), and (symmetric) set difference with logarithmic latency and with information theoretically optimal work. We present both asymptotically optimal solutions and simplified versions that perform well in practice - they are several times faster than previous implementations

    Dynamic load balancing in parallel KD-tree k-means

    Get PDF
    One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy

    Parallel Weighted Random Sampling

    Get PDF
    Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries

    Subtree weight ratios for optimal binary search trees

    Get PDF
    For an optimal binary search tree T with a subtree S(d) at a distance d from the root of T, we study the ratio of the weight of S(d) to the weight of T. The maximum possible value, which we call ρ(d), of the ratio of weights, is found to have an upper bound of 2/F_d+3 where F_i is the ith Fibonacci number. For d = 1, 2, 3, and 4, the bound is shown to be tight. For larger d, the Fibonacci bound gives ρ(d) = O(ϕ^d) where ϕ ~ .61803 is the golden ratio. By giving a particular set of optimal trees, we prove ρ(d) = Ω((.58578 ... )^d), and believe a similar proof follows for ρ(d) = Ω((.60179 ... )^d). If we include frequencies for unsuccessful searches in the optimal binary search trees, the Fibonacci bound is found to be tight
    corecore