554 research outputs found

    An empirical evaluation of High-Level Synthesis languages and tools for database acceleration

    Get PDF
    High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.Peer ReviewedPostprint (author’s final draft

    Good approximate quantum LDPC codes from spacetime circuit Hamiltonians

    Get PDF
    We study approximate quantum low-density parity-check (QLDPC) codes, which are approximate quantum error-correcting codes specified as the ground space of a frustration-free local Hamiltonian, whose terms do not necessarily commute. Such codes generalize stabilizer QLDPC codes, which are exact quantum error-correcting codes with sparse, low-weight stabilizer generators (i.e. each stabilizer generator acts on a few qubits, and each qubit participates in a few stabilizer generators). Our investigation is motivated by an important question in Hamiltonian complexity and quantum coding theory: do stabilizer QLDPC codes with constant rate, linear distance, and constant-weight stabilizers exist? We show that obtaining such optimal scaling of parameters (modulo polylogarithmic corrections) is possible if we go beyond stabilizer codes: we prove the existence of a family of [[N,k,d,ε]][[N,k,d,\varepsilon]] approximate QLDPC codes that encode k=Ω~(N)k = \widetilde{\Omega}(N) logical qubits into NN physical qubits with distance d=Ω~(N)d = \widetilde{\Omega}(N) and approximation infidelity ε=O(1/polylog(N))\varepsilon = \mathcal{O}(1/\textrm{polylog}(N)). The code space is stabilized by a set of 10-local noncommuting projectors, with each physical qubit only participating in O(polylogN)\mathcal{O}(\textrm{polylog} N) projectors. We prove the existence of an efficient encoding map, and we show that arbitrary Pauli errors can be locally detected by circuits of polylogarithmic depth. Finally, we show that the spectral gap of the code Hamiltonian is Ω~(N−3.09)\widetilde{\Omega}(N^{-3.09}) by analyzing a spacetime circuit-to-Hamiltonian construction for a bitonic sorting network architecture that is spatially local in polylog(N)\textrm{polylog}(N) dimensions.Comment: 51 pages, 13 figure

    Algebraic study of generalization and redundancy of the bitonic sorter.

    Get PDF
    Qian Zhengfeng.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 127-129).Abstracts in English and Chinese.Chapter Chapter 1 --- Groundwork --- p.1Chapter 1.1 --- Introduction --- p.1Chapter 1.2 --- Exchange Patterns --- p.5Chapter 1.3 --- Multistage Networks --- p.9Chapter 1.3.1 --- Multistage Networks --- p.9Chapter 1.3.2 --- Banyan-type Networks --- p.11Chapter 1.4 --- Networks of Sorting Cells --- p.15Chapter 1.5 --- Symbolic Representation and Matrix Representation --- p.19Chapter 1.5.1 --- Symbolic Representation of a Multistage Interconnection Network --- p.19Chapter 1.5.2 --- Symbolic Representation of a Network of Sorting Cells --- p.21Chapter 1.5.3 --- Matrix Representation of a Network of Sorting Cells --- p.22Chapter 1.6 --- Summary --- p.24Chapter Chapter 2 --- Construction of Generalized Bitonic Sorters by Merging Rotated Monotonic Sequences --- p.25Chapter 2.1 --- Merging Networks --- p.25Chapter 2.1.1 --- Recursive 2-stage Construction --- p.26Chapter 2.1.2 --- UC/CU Non-blocking Switches --- p.35Chapter 2.1.3 --- Circular Sorters and Merging Networks --- p.41Chapter 2.2 --- Construction of Generalized Bitonic Sorters --- p.48Chapter 2.2.1 --- Bitonic Ar-sorters and Bitonic Dr-sorters --- p.48Chapter 2.2.2 --- Algorithms for Construction of Generalized Bitonic Sorters by Merging Rotated Monotonic Sequences --- p.51Chapter 2.3 --- Summary --- p.73Chapter Chapter 3 --- Construction of Generalized Bitonic Sorters by Cross-k Cell Rearrangement --- p.74Chapter 3.1 --- Cross-k Cell Rearrangement on a Multistage Network --- p.74Chapter 3.1.1 --- Intra-stage Cell Rearrangement --- p.74Chapter 3.1.2 --- Equivalence of Networks --- p.77Chapter 3.1.3 --- Cross-k Cell Rearrangement --- p.80Chapter 3.2 --- Construction of Generalized Bitonic Sorters --- p.85Chapter 3.3 --- Summary --- p.99Chapter Chapter 4 --- Redundancy of the Bitonic Network --- p.100Chapter 4.1 --- Counting of Identified Generalized Bitonic Sorters --- p.100Chapter 4.2 --- Redundancy of the Bitonic Network --- p.110Epilogue --- p.112Appendix C Program for Exhaustive Search of 8×8 Generalized Bitonic Sorters --- p.113References --- p.12

    A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake

    Full text link
    The modern CPU's design, which is composed of hierarchical memory and SIMD/vectorization capability, governs the potential for algorithms to be transformed into efficient implementations. The release of the AVX-512 changed things radically, and motivated us to search for an efficient sorting algorithm that can take advantage of it. In this paper, we describe the best strategy we have found, which is a novel two parts hybrid sort, based on the well-known Quicksort algorithm. The central partitioning operation is performed by a new algorithm, and small partitions/arrays are sorted using a branch-free Bitonic-based sort. This study is also an illustration of how classical algorithms can be adapted and enhanced by the AVX-512 extension. We evaluate the performance of our approach on a modern Intel Xeon Skylake and assess the different layers of our implementation by sorting/partitioning integers, double floating-point numbers, and key/value pairs of integers. Our results demonstrate that our approach is faster than two libraries of reference: the GNU \emph{C++} sort algorithm by a speedup factor of 4, and the Intel IPP library by a speedup factor of 1.4.Comment: 8 pages, research pape

    An Enhanced Multiway Sorting Network Based on n-Sorters

    Full text link
    Merging-based sorting networks are an important family of sorting networks. Most merge sorting networks are based on 2-way or multi-way merging algorithms using 2-sorters as basic building blocks. An alternative is to use n-sorters, instead of 2-sorters, as the basic building blocks so as to greatly reduce the number of sorters as well as the latency. Based on a modified Leighton's columnsort algorithm, an n-way merging algorithm, referred to as SS-Mk, that uses n-sorters as basic building blocks was proposed. In this work, we first propose a new multiway merging algorithm with n-sorters as basic building blocks that merges n sorted lists of m values each in 1 + ceil(m/2) stages (n <= m). Based on our merging algorithm, we also propose a sorting algorithm, which requires O(N log2 N) basic sorters to sort N inputs. While the asymptotic complexity (in terms of the required number of sorters) of our sorting algorithm is the same as the SS-Mk, for wide ranges of N, our algorithm requires fewer sorters than the SS-Mk. Finally, we consider a binary sorting network, where the basic sorter is implemented in threshold logic and scales linearly with the number of inputs, and compare the complexity in terms of the required number of gates. For wide ranges of N, our algorithm requires fewer gates than the SS-Mk.Comment: 13 pages, 14 figure
    • …
    corecore