87,235 research outputs found
A Lower Bound Technique for Communication in BSP
Communication is a major factor determining the performance of algorithms on
current computing systems; it is therefore valuable to provide tight lower
bounds on the communication complexity of computations. This paper presents a
lower bound technique for the communication complexity in the bulk-synchronous
parallel (BSP) model of a given class of DAG computations. The derived bound is
expressed in terms of the switching potential of a DAG, that is, the number of
permutations that the DAG can realize when viewed as a switching network. The
proposed technique yields tight lower bounds for the fast Fourier transform
(FFT), and for any sorting and permutation network. A stronger bound is also
derived for the periodic balanced sorting network, by applying this technique
to suitable subnetworks. Finally, we demonstrate that the switching potential
captures communication requirements even in computational models different from
BSP, such as the I/O model and the LPRAM
Optimized Merge Sort on Modern Commodity Multi-core CPUs
Sorting is a kind of widely used basic algorithms. As the high performance computing devices are increasingly common, more and more modern commodity machines have the capability of parallel concurrent computing. A new implementation of sorting algorithms is proposed to harness the power of newer SIMD operations and multi-core computing provided by modern CPUs. The algorithm is hybrid by optimized bitonic sorting network and multi-way merge. New SIMD instructions provided by modern CPUs are used in the bitonic network implementation, which adopted a different method to arrange data so that the number of SIMD operations is reduced. Balanced binary trees are used in multi-way merge, which is also different with former implementations. Efforts are also paid on minimizing data moving in memory since merge sort is a kind of memory-bound application. The performance evaluation shows that the proposed algorithm is twice as fast as the sort function in C++ standard library when only single thread is used. It also outperforms radix sort implemented in Boost library
The -Center Problem in Tree Networks Revisited
We present two improved algorithms for weighted discrete -center problem
for tree networks with vertices. One of our proposed algorithms runs in
time. For all values of , our algorithm
thus runs as fast as or faster than the most efficient time
algorithm obtained by applying Cole's speed-up technique [cole1987] to the
algorithm due to Megiddo and Tamir [megiddo1983], which has remained
unchallenged for nearly 30 years. Our other algorithm, which is more practical,
runs in time, and when it is
faster than Megiddo and Tamir's time algorithm
[megiddo1983]
Faster 3-Periodic Merging Networks
We consider the problem of merging two sorted sequences on a comparator
network that is used repeatedly, that is, if the output is not sorted, the
network is applied again using the output as input. The challenging task is to
construct such networks of small depth. The first constructions of merging
networks with a constant period were given by Kuty{\l}owski, Lory\'s and
Oesterdikhoff. They have given -periodic network that merges two sorted
sequences of numbers in time and a similar network of period
that works in . We present a new family of such networks that are
based on Canfield and Williamson periodic sorter. Our -periodic merging
networks work in time upper-bounded by . The construction can be
easily generalized to larger constant periods with decreasing running time, for
example, to -periodic ones that work in time upper-bounded by .
Moreover, to obtain the facts we have introduced a new proof technique
- …