2,167 research outputs found

    Fast deterministic processor allocation

    No full text
    Interval allocation has been suggested as a possible formalization for the PRAM of the (vaguely defined) processor allocation problem, which is of fundamental importance in parallel computing. The interval allocation problem is, given nn nonnegative integers x1,,xnx_1,\ldots,x_n, to allocate nn nonoverlapping subarrays of sizes x1,,xnx_1,\ldots,x_n from within a base array of O(j=1nxj)O(\sum_{j=1}^n x_j) cells. We show that interval allocation problems of size nn can be solved in O((loglogn)3)O((\log\log n)^3) time with optimal speedup on a deterministic CRCW PRAM. In addition to a general solution to the processor allocation problem, this implies an improved deterministic algorithm for the problem of approximate summation. For both interval allocation and approximate summation, the fastest previous deterministic algorithms have running times of Θ(logn/loglogn)\Theta({{\log n}/{\log\log n}}). We also describe an application to the problem of computing the connected components of an undirected graph

    Parallel Weighted Random Sampling

    Get PDF
    Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries

    Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

    Full text link
    There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the Graph-Based Benchmark Suite (GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 201

    A lower bound for linear approximate compaction

    Get PDF
    The {\em λ\lambda-approximate compaction} problem is: given an input array of nn values, each either 0 or 1, place each value in an output array so that all the 1's are in the first (1+λ)k(1+\lambda)k array locations, where kk is the number of 1's in the input. λ\lambda is an accuracy parameter. This problem is of fundamental importance in parallel computation because of its applications to processor allocation and approximate counting. When λ\lambda is a constant, the problem is called {\em Linear Approximate Compaction} (LAC). On the CRCW PRAM model, %there is an algorithm that solves approximate compaction in \order{(\log\log n)^3} time for λ=1loglogn\lambda = \frac{1}{\log\log n}, using n(loglogn)3\frac{n}{(\log\log n)^3} processors. Our main result shows that this is close to the best possible. Specifically, we prove that LAC requires %Ω(loglogn)\Omega(\log\log n) time using \order{n} processors. We also give a tradeoff between λ\lambda and the processing time. For ϵ<1\epsilon < 1, and λ=nϵ\lambda = n^{\epsilon}, the time required is Ω(log1ϵ)\Omega(\log \frac{1}{\epsilon})
    corecore