8,593 research outputs found
An Efficient Multiway Mergesort for GPU Architectures
Sorting is a primitive operation that is a building block for countless
algorithms. As such, it is important to design sorting algorithms that approach
peak performance on a range of hardware architectures. Graphics Processing
Units (GPUs) are particularly attractive architectures as they provides massive
parallelism and computing power. However, the intricacies of their compute and
memory hierarchies make designing GPU-efficient algorithms challenging. In this
work we present GPU Multiway Mergesort (MMS), a new GPU-efficient multiway
mergesort algorithm. MMS employs a new partitioning technique that exposes the
parallelism needed by modern GPU architectures. To the best of our knowledge,
MMS is the first sorting algorithm for the GPU that is asymptotically optimal
in terms of global memory accesses and that is completely free of shared memory
bank conflicts.
We realize an initial implementation of MMS, evaluate its performance on
three modern GPU architectures, and compare it to competitive implementations
available in state-of-the-art GPU libraries. Despite these implementations
being highly optimized, MMS compares favorably, achieving performance
improvements for most random inputs. Furthermore, unlike MMS, state-of-the-art
algorithms are susceptible to bank conflicts. We find that for certain inputs
that cause these algorithms to incur large numbers of bank conflicts, MMS can
achieve up to a 37.6% speedup over its fastest competitor. Overall, even though
its current implementation is not fully optimized, due to its efficient use of
the memory hierarchy, MMS outperforms the fastest comparison-based sorting
implementations available to date
A quantum analog of Huffman coding
We analyze a generalization of Huffman coding to the quantum case. In
particular, we notice various difficulties in using instantaneous codes for
quantum communication. Nevertheless, for the storage of quantum information, we
have succeeded in constructing a Huffman-coding inspired quantum scheme. The
number of computational steps in the encoding and decoding processes of N
quantum signals can be made to be of polylogarithmic depth by a massively
parallel implementation of a quantum gate array. This is to be compared with
the O (N^3) computational steps required in the sequential implementation by
Cleve and DiVincenzo of the well-known quantum noiseless block coding scheme of
Schumacher. We also show that O(N^2(log N)^a) computational steps are needed
for the communication of quantum information using another Huffman-coding
inspired scheme where the sender must disentangle her encoding device before
the receiver can perform any measurements on his signals.Comment: Revised version, 7 pages, two-column, RevTex. Presented at 1998 IEEE
International Symposium on Information Theor
Quantum Reverse Shannon Theorem
Dual to the usual noisy channel coding problem, where a noisy (classical or
quantum) channel is used to simulate a noiseless one, reverse Shannon theorems
concern the use of noiseless channels to simulate noisy ones, and more
generally the use of one noisy channel to simulate another. For channels of
nonzero capacity, this simulation is always possible, but for it to be
efficient, auxiliary resources of the proper kind and amount are generally
required. In the classical case, shared randomness between sender and receiver
is a sufficient auxiliary resource, regardless of the nature of the source, but
in the quantum case the requisite auxiliary resources for efficient simulation
depend on both the channel being simulated, and the source from which the
channel inputs are coming. For tensor power sources (the quantum generalization
of classical IID sources), entanglement in the form of standard ebits
(maximally entangled pairs of qubits) is sufficient, but for general sources,
which may be arbitrarily correlated or entangled across channel inputs,
additional resources, such as entanglement-embezzling states or backward
communication, are generally needed. Combining existing and new results, we
establish the amounts of communication and auxiliary resources needed in both
the classical and quantum cases, the tradeoffs among them, and the loss of
simulation efficiency when auxiliary resources are absent or insufficient. In
particular we find a new single-letter expression for the excess forward
communication cost of coherent feedback simulations of quantum channels (i.e.
simulations in which the sender retains what would escape into the environment
in an ordinary simulation), on non-tensor-power sources in the presence of
unlimited ebits but no other auxiliary resource. Our results on tensor power
sources establish a strong converse to the entanglement-assisted capacity
theorem.Comment: 35 pages, to appear in IEEE-IT. v2 has a fixed proof of the Clueless
Eve result, a new single-letter formula for the "spread deficit", better
error scaling, and an improved strong converse. v3 and v4 each make small
improvements to the presentation and add references. v5 fixes broken
reference
The VOISE Algorithm: a Versatile Tool for Automatic Segmentation of Astronomical Images
The auroras on Jupiter and Saturn can be studied with a high sensitivity and
resolution by the Hubble Space Telescope (HST) ultraviolet (UV) and
far-ultraviolet (FUV) Space Telescope spectrograph (STIS) and Advanced Camera
for Surveys (ACS) instruments. We present results of automatic detection and
segmentation of Jupiter's auroral emissions as observed by HST ACS instrument
with VOronoi Image SEgmentation (VOISE). VOISE is a dynamic algorithm for
partitioning the underlying pixel grid of an image into regions according to a
prescribed homogeneity criterion. The algorithm consists of an iterative
procedure that dynamically constructs a tessellation of the image plane based
on a Voronoi Diagram, until the intensity of the underlying image within each
region is classified as homogeneous. The computed tessellations allow the
extraction of quantitative information about the auroral features such as mean
intensity, latitudinal and longitudinal extents and length scales. These
outputs thus represent a more automated and objective method of characterising
auroral emissions than manual inspection.Comment: 9 pages, 7 figures; accepted for publication in MNRA
- …