2,245 research outputs found

    Efficient Parallel Random Sampling : Vectorized, Cache-Efficient, and Online

    Get PDF
    We consider the problem of sampling nn numbers from the range {1,,N}\{1,\ldots,N\} without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and leads to a parallel algorithm running in expected time O(n/p+logp)\mathcal{O}(n/p+\log p) on pp processors, i.e., scales to massively parallel machines even for moderate values of nn. The amount of communication between the processors is very small (at most O(logp)\mathcal{O}(\log p)) and independent of the sample size. We also discuss modifications needed for load balancing, online sampling, sampling with replacement, Bernoulli sampling, and vectorization on SIMD units or GPUs

    Improved Algorithms for Computing the Cycle of Minimum Cost-to-Time Ratio in Directed Graphs

    Get PDF
    We study the problem of finding the cycle of minimum cost-to-time ratio in a directed graph with n nodes and m edges. This problem has a long history in combinatorial optimization and has recently seen interesting applications in the context of quantitative verification. We focus on strongly polynomial algorithms to cover the use-case where the weights are relatively large compared to the size of the graph. Our main result is an algorithm with running time ~O(m^{3/4} n^{3/2}), which gives the first improvement over Megiddo\u27s ~O(n^3) algorithm [JACM\u2783] for sparse graphs (We use the notation ~O(.) to hide factors that are polylogarithmic in n.) We further demonstrate how to obtain both an algorithm with running time n^3/2^{Omega(sqrt(log n)} on general graphs and an algorithm with running time ~O(n) on constant treewidth graphs. To obtain our main result, we develop a parallel algorithm for negative cycle detection and single-source shortest paths that might be of independent interest
    corecore