8,593 research outputs found

    An Efficient Multiway Mergesort for GPU Architectures

    Full text link
    Sorting is a primitive operation that is a building block for countless algorithms. As such, it is important to design sorting algorithms that approach peak performance on a range of hardware architectures. Graphics Processing Units (GPUs) are particularly attractive architectures as they provides massive parallelism and computing power. However, the intricacies of their compute and memory hierarchies make designing GPU-efficient algorithms challenging. In this work we present GPU Multiway Mergesort (MMS), a new GPU-efficient multiway mergesort algorithm. MMS employs a new partitioning technique that exposes the parallelism needed by modern GPU architectures. To the best of our knowledge, MMS is the first sorting algorithm for the GPU that is asymptotically optimal in terms of global memory accesses and that is completely free of shared memory bank conflicts. We realize an initial implementation of MMS, evaluate its performance on three modern GPU architectures, and compare it to competitive implementations available in state-of-the-art GPU libraries. Despite these implementations being highly optimized, MMS compares favorably, achieving performance improvements for most random inputs. Furthermore, unlike MMS, state-of-the-art algorithms are susceptible to bank conflicts. We find that for certain inputs that cause these algorithms to incur large numbers of bank conflicts, MMS can achieve up to a 37.6% speedup over its fastest competitor. Overall, even though its current implementation is not fully optimized, due to its efficient use of the memory hierarchy, MMS outperforms the fastest comparison-based sorting implementations available to date

    A quantum analog of Huffman coding

    Get PDF
    We analyze a generalization of Huffman coding to the quantum case. In particular, we notice various difficulties in using instantaneous codes for quantum communication. Nevertheless, for the storage of quantum information, we have succeeded in constructing a Huffman-coding inspired quantum scheme. The number of computational steps in the encoding and decoding processes of N quantum signals can be made to be of polylogarithmic depth by a massively parallel implementation of a quantum gate array. This is to be compared with the O (N^3) computational steps required in the sequential implementation by Cleve and DiVincenzo of the well-known quantum noiseless block coding scheme of Schumacher. We also show that O(N^2(log N)^a) computational steps are needed for the communication of quantum information using another Huffman-coding inspired scheme where the sender must disentangle her encoding device before the receiver can perform any measurements on his signals.Comment: Revised version, 7 pages, two-column, RevTex. Presented at 1998 IEEE International Symposium on Information Theor

    Quantum Reverse Shannon Theorem

    Get PDF
    Dual to the usual noisy channel coding problem, where a noisy (classical or quantum) channel is used to simulate a noiseless one, reverse Shannon theorems concern the use of noiseless channels to simulate noisy ones, and more generally the use of one noisy channel to simulate another. For channels of nonzero capacity, this simulation is always possible, but for it to be efficient, auxiliary resources of the proper kind and amount are generally required. In the classical case, shared randomness between sender and receiver is a sufficient auxiliary resource, regardless of the nature of the source, but in the quantum case the requisite auxiliary resources for efficient simulation depend on both the channel being simulated, and the source from which the channel inputs are coming. For tensor power sources (the quantum generalization of classical IID sources), entanglement in the form of standard ebits (maximally entangled pairs of qubits) is sufficient, but for general sources, which may be arbitrarily correlated or entangled across channel inputs, additional resources, such as entanglement-embezzling states or backward communication, are generally needed. Combining existing and new results, we establish the amounts of communication and auxiliary resources needed in both the classical and quantum cases, the tradeoffs among them, and the loss of simulation efficiency when auxiliary resources are absent or insufficient. In particular we find a new single-letter expression for the excess forward communication cost of coherent feedback simulations of quantum channels (i.e. simulations in which the sender retains what would escape into the environment in an ordinary simulation), on non-tensor-power sources in the presence of unlimited ebits but no other auxiliary resource. Our results on tensor power sources establish a strong converse to the entanglement-assisted capacity theorem.Comment: 35 pages, to appear in IEEE-IT. v2 has a fixed proof of the Clueless Eve result, a new single-letter formula for the "spread deficit", better error scaling, and an improved strong converse. v3 and v4 each make small improvements to the presentation and add references. v5 fixes broken reference

    The VOISE Algorithm: a Versatile Tool for Automatic Segmentation of Astronomical Images

    Full text link
    The auroras on Jupiter and Saturn can be studied with a high sensitivity and resolution by the Hubble Space Telescope (HST) ultraviolet (UV) and far-ultraviolet (FUV) Space Telescope spectrograph (STIS) and Advanced Camera for Surveys (ACS) instruments. We present results of automatic detection and segmentation of Jupiter's auroral emissions as observed by HST ACS instrument with VOronoi Image SEgmentation (VOISE). VOISE is a dynamic algorithm for partitioning the underlying pixel grid of an image into regions according to a prescribed homogeneity criterion. The algorithm consists of an iterative procedure that dynamically constructs a tessellation of the image plane based on a Voronoi Diagram, until the intensity of the underlying image within each region is classified as homogeneous. The computed tessellations allow the extraction of quantitative information about the auroral features such as mean intensity, latitudinal and longitudinal extents and length scales. These outputs thus represent a more automated and objective method of characterising auroral emissions than manual inspection.Comment: 9 pages, 7 figures; accepted for publication in MNRA
    corecore