535 research outputs found
On the average running time of odd-even merge sort
This paper is concerned with the average running time of Batcher's odd-even merge sort when implemented on a collection of processors. We consider the case where , the size of the input, is an arbitrary multiple of the number of processors used. We show that Batcher's odd-even merge (for two sorted lists of length each) can be implemented to run in time on the average, and that odd-even merge sort can be implemented to run in time on the average. In the case of merging (sorting), the average is taken over all possible outcomes of the merging (all possible permutations of elements). That means that odd-even merge and odd-even merge sort have an optimal average running time if . The constants involved are also quite small
Good approximate quantum LDPC codes from spacetime circuit Hamiltonians
We study approximate quantum low-density parity-check (QLDPC) codes, which
are approximate quantum error-correcting codes specified as the ground space of
a frustration-free local Hamiltonian, whose terms do not necessarily commute.
Such codes generalize stabilizer QLDPC codes, which are exact quantum
error-correcting codes with sparse, low-weight stabilizer generators (i.e. each
stabilizer generator acts on a few qubits, and each qubit participates in a few
stabilizer generators). Our investigation is motivated by an important question
in Hamiltonian complexity and quantum coding theory: do stabilizer QLDPC codes
with constant rate, linear distance, and constant-weight stabilizers exist?
We show that obtaining such optimal scaling of parameters (modulo
polylogarithmic corrections) is possible if we go beyond stabilizer codes: we
prove the existence of a family of approximate QLDPC
codes that encode logical qubits into physical
qubits with distance and approximation infidelity
. The code space is
stabilized by a set of 10-local noncommuting projectors, with each physical
qubit only participating in projectors. We
prove the existence of an efficient encoding map, and we show that arbitrary
Pauli errors can be locally detected by circuits of polylogarithmic depth.
Finally, we show that the spectral gap of the code Hamiltonian is
by analyzing a spacetime circuit-to-Hamiltonian
construction for a bitonic sorting network architecture that is spatially local
in dimensions.Comment: 51 pages, 13 figure
A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake
The modern CPU's design, which is composed of hierarchical memory and
SIMD/vectorization capability, governs the potential for algorithms to be
transformed into efficient implementations. The release of the AVX-512 changed
things radically, and motivated us to search for an efficient sorting algorithm
that can take advantage of it. In this paper, we describe the best strategy we
have found, which is a novel two parts hybrid sort, based on the well-known
Quicksort algorithm. The central partitioning operation is performed by a new
algorithm, and small partitions/arrays are sorted using a branch-free
Bitonic-based sort. This study is also an illustration of how classical
algorithms can be adapted and enhanced by the AVX-512 extension. We evaluate
the performance of our approach on a modern Intel Xeon Skylake and assess the
different layers of our implementation by sorting/partitioning integers, double
floating-point numbers, and key/value pairs of integers. Our results
demonstrate that our approach is faster than two libraries of reference: the
GNU \emph{C++} sort algorithm by a speedup factor of 4, and the Intel IPP
library by a speedup factor of 1.4.Comment: 8 pages, research pape
An empirical evaluation of High-Level Synthesis languages and tools for database acceleration
High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.Peer ReviewedPostprint (author’s final draft
- …