117,446 research outputs found
An Efficient Multiway Mergesort for GPU Architectures
Sorting is a primitive operation that is a building block for countless
algorithms. As such, it is important to design sorting algorithms that approach
peak performance on a range of hardware architectures. Graphics Processing
Units (GPUs) are particularly attractive architectures as they provides massive
parallelism and computing power. However, the intricacies of their compute and
memory hierarchies make designing GPU-efficient algorithms challenging. In this
work we present GPU Multiway Mergesort (MMS), a new GPU-efficient multiway
mergesort algorithm. MMS employs a new partitioning technique that exposes the
parallelism needed by modern GPU architectures. To the best of our knowledge,
MMS is the first sorting algorithm for the GPU that is asymptotically optimal
in terms of global memory accesses and that is completely free of shared memory
bank conflicts.
We realize an initial implementation of MMS, evaluate its performance on
three modern GPU architectures, and compare it to competitive implementations
available in state-of-the-art GPU libraries. Despite these implementations
being highly optimized, MMS compares favorably, achieving performance
improvements for most random inputs. Furthermore, unlike MMS, state-of-the-art
algorithms are susceptible to bank conflicts. We find that for certain inputs
that cause these algorithms to incur large numbers of bank conflicts, MMS can
achieve up to a 37.6% speedup over its fastest competitor. Overall, even though
its current implementation is not fully optimized, due to its efficient use of
the memory hierarchy, MMS outperforms the fastest comparison-based sorting
implementations available to date
An Enhanced Multiway Sorting Network Based on n-Sorters
Merging-based sorting networks are an important family of sorting networks.
Most merge sorting networks are based on 2-way or multi-way merging algorithms
using 2-sorters as basic building blocks. An alternative is to use n-sorters,
instead of 2-sorters, as the basic building blocks so as to greatly reduce the
number of sorters as well as the latency. Based on a modified Leighton's
columnsort algorithm, an n-way merging algorithm, referred to as SS-Mk, that
uses n-sorters as basic building blocks was proposed. In this work, we first
propose a new multiway merging algorithm with n-sorters as basic building
blocks that merges n sorted lists of m values each in 1 + ceil(m/2) stages (n
<= m). Based on our merging algorithm, we also propose a sorting algorithm,
which requires O(N log2 N) basic sorters to sort N inputs. While the asymptotic
complexity (in terms of the required number of sorters) of our sorting
algorithm is the same as the SS-Mk, for wide ranges of N, our algorithm
requires fewer sorters than the SS-Mk. Finally, we consider a binary sorting
network, where the basic sorter is implemented in threshold logic and scales
linearly with the number of inputs, and compare the complexity in terms of the
required number of gates. For wide ranges of N, our algorithm requires fewer
gates than the SS-Mk.Comment: 13 pages, 14 figure
Fast Multi-Scale Community Detection based on Local Criteria within a Multi-Threaded Algorithm
Many systems can be described using graphs, or networks. Detecting
communities in these networks can provide information about the underlying
structure and functioning of the original systems. Yet this detection is a
complex task and a large amount of work was dedicated to it in the past decade.
One important feature is that communities can be found at several scales, or
levels of resolution, indicating several levels of organisations. Therefore
solutions to the community structure may not be unique. Also networks tend to
be large and hence require efficient processing. In this work, we present a new
algorithm for the fast detection of communities across scales using a local
criterion. We exploit the local aspect of the criterion to enable parallel
computation and improve the algorithm's efficiency further. The algorithm is
tested against large generated multi-scale networks and experiments demonstrate
its efficiency and accuracy.Comment: arXiv admin note: text overlap with arXiv:1204.100
- …