87,235 research outputs found

    A Lower Bound Technique for Communication in BSP

    Get PDF
    Communication is a major factor determining the performance of algorithms on current computing systems; it is therefore valuable to provide tight lower bounds on the communication complexity of computations. This paper presents a lower bound technique for the communication complexity in the bulk-synchronous parallel (BSP) model of a given class of DAG computations. The derived bound is expressed in terms of the switching potential of a DAG, that is, the number of permutations that the DAG can realize when viewed as a switching network. The proposed technique yields tight lower bounds for the fast Fourier transform (FFT), and for any sorting and permutation network. A stronger bound is also derived for the periodic balanced sorting network, by applying this technique to suitable subnetworks. Finally, we demonstrate that the switching potential captures communication requirements even in computational models different from BSP, such as the I/O model and the LPRAM

    Optimized Merge Sort on Modern Commodity Multi-core CPUs

    Get PDF
    Sorting is a kind of widely used basic algorithms. As the high performance computing devices are increasingly common, more and more modern commodity machines have the capability of parallel concurrent computing. A new implementation of sorting algorithms is proposed to harness the power of newer SIMD operations and multi-core computing provided by modern CPUs. The algorithm is hybrid by optimized bitonic sorting network and multi-way merge. New SIMD instructions provided by modern CPUs are used in the bitonic network implementation, which adopted a different method to arrange data so that the number of SIMD operations is reduced. Balanced binary trees are used in multi-way merge, which is also different with former implementations. Efforts are also paid on minimizing data moving in memory since merge sort is a kind of memory-bound application. The performance evaluation shows that the proposed algorithm is twice as fast as the sort function in C++ standard library when only single thread is used. It also outperforms radix sort implemented in Boost library

    The pp-Center Problem in Tree Networks Revisited

    Get PDF
    We present two improved algorithms for weighted discrete pp-center problem for tree networks with nn vertices. One of our proposed algorithms runs in O(nlogn+plog2nlog(n/p))O(n \log n + p \log^2 n \log(n/p)) time. For all values of pp, our algorithm thus runs as fast as or faster than the most efficient O(nlog2n)O(n\log^2 n) time algorithm obtained by applying Cole's speed-up technique [cole1987] to the algorithm due to Megiddo and Tamir [megiddo1983], which has remained unchallenged for nearly 30 years. Our other algorithm, which is more practical, runs in O(nlogn+p2log2(n/p))O(n \log n + p^2 \log^2(n/p)) time, and when p=O(n)p=O(\sqrt{n}) it is faster than Megiddo and Tamir's O(nlog2nloglogn)O(n \log^2n \log\log n) time algorithm [megiddo1983]

    Faster 3-Periodic Merging Networks

    Full text link
    We consider the problem of merging two sorted sequences on a comparator network that is used repeatedly, that is, if the output is not sorted, the network is applied again using the output as input. The challenging task is to construct such networks of small depth. The first constructions of merging networks with a constant period were given by Kuty{\l}owski, Lory\'s and Oesterdikhoff. They have given 33-periodic network that merges two sorted sequences of NN numbers in time 12logN12\log N and a similar network of period 44 that works in 5.67logN5.67\log N. We present a new family of such networks that are based on Canfield and Williamson periodic sorter. Our 33-periodic merging networks work in time upper-bounded by 6logN6\log N. The construction can be easily generalized to larger constant periods with decreasing running time, for example, to 44-periodic ones that work in time upper-bounded by 4logN4\log N. Moreover, to obtain the facts we have introduced a new proof technique
    corecore