38,889 research outputs found

    An Enhanced Multiway Sorting Network Based on n-Sorters

    Full text link
    Merging-based sorting networks are an important family of sorting networks. Most merge sorting networks are based on 2-way or multi-way merging algorithms using 2-sorters as basic building blocks. An alternative is to use n-sorters, instead of 2-sorters, as the basic building blocks so as to greatly reduce the number of sorters as well as the latency. Based on a modified Leighton's columnsort algorithm, an n-way merging algorithm, referred to as SS-Mk, that uses n-sorters as basic building blocks was proposed. In this work, we first propose a new multiway merging algorithm with n-sorters as basic building blocks that merges n sorted lists of m values each in 1 + ceil(m/2) stages (n <= m). Based on our merging algorithm, we also propose a sorting algorithm, which requires O(N log2 N) basic sorters to sort N inputs. While the asymptotic complexity (in terms of the required number of sorters) of our sorting algorithm is the same as the SS-Mk, for wide ranges of N, our algorithm requires fewer sorters than the SS-Mk. Finally, we consider a binary sorting network, where the basic sorter is implemented in threshold logic and scales linearly with the number of inputs, and compare the complexity in terms of the required number of gates. For wide ranges of N, our algorithm requires fewer gates than the SS-Mk.Comment: 13 pages, 14 figure

    A Bulk-Parallel Priority Queue in External Memory with STXXL

    Get PDF
    We propose the design and an implementation of a bulk-parallel external memory priority queue to take advantage of both shared-memory parallelism and high external memory transfer speeds to parallel disks. To achieve higher performance by decoupling item insertions and extractions, we offer two parallelization interfaces: one using "bulk" sequences, the other by defining "limit" items. In the design, we discuss how to parallelize insertions using multiple heaps, and how to calculate a dynamic prediction sequence to prefetch blocks and apply parallel multiway merge for extraction. Our experimental results show that in the selected benchmarks the priority queue reaches 75% of the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape

    Sorting Integers on the AP1000

    Full text link
    Sorting is one of the classic problems of computer science. Whilst well understood on sequential machines, the diversity of architectures amongst parallel systems means that algorithms do not perform uniformly on all platforms. This document describes the implementation of a radix based algorithm for sorting positive integers on a Fujitsu AP1000 Supercomputer, which was constructed as an entry in the Joint Symposium on Parallel Processing (JSPP) 1994 Parallel Software Contest (PSC94). Brief consideration is also given to a full radix sort conducted in parallel across the machine.Comment: 1994 Project Report, 23 page

    GPU LSM: A Dynamic Dictionary Data Structure for the GPU

    Full text link
    We develop a dynamic dictionary data structure for the GPU, supporting fast insertions and deletions, based on the Log Structured Merge tree (LSM). Our implementation on an NVIDIA K40c GPU has an average update (insertion or deletion) rate of 225 M elements/s, 13.5x faster than merging items into a sorted array. The GPU LSM supports the retrieval operations of lookup, count, and range query operations with an average rate of 75 M, 32 M and 23 M queries/s respectively. The trade-off for the dynamic updates is that the sorted array is almost twice as fast on retrievals. We believe that our GPU LSM is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS'18

    Fast evaluation of union-intersection expressions

    Full text link
    We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worst-case efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model of computation, and sets of machine words, but also state our results in the I/O model. On a RAM with word size ww, a special case of our result is that the intersection of mm (preprocessed) sets, containing nn elements in total, can be computed in expected time O(n(logw)2/w+km)O(n (\log w)^2 / w + km), where kk is the number of elements in the intersection. If the first of the two terms dominates, this is a factor w1o(1)w^{1-o(1)} faster than the standard solution of merging sorted lists. We show a cell probe lower bound of time Ω(n/(wmlogm)+(1logkw)k)\Omega(n/(w m \log m)+ (1-\tfrac{\log k}{w}) k), meaning that our upper bound is nearly optimal for small mm. Our algorithm uses a novel combination of approximate set representations and word-level parallelism

    Graphical Reasoning in Compact Closed Categories for Quantum Computation

    Full text link
    Compact closed categories provide a foundational formalism for a variety of important domains, including quantum computation. These categories have a natural visualisation as a form of graphs. We present a formalism for equational reasoning about such graphs and develop this into a generic proof system with a fixed logical kernel for equational reasoning about compact closed categories. Automating this reasoning process is motivated by the slow and error prone nature of manual graph manipulation. A salient feature of our system is that it provides a formal and declarative account of derived results that can include `ellipses'-style notation. We illustrate the framework by instantiating it for a graphical language of quantum computation and show how this can be used to perform symbolic computation.Comment: 21 pages, 9 figures. This is the journal version of the paper published at AIS
    corecore