273 research outputs found

    On the average running time of odd-even merge sort

    Get PDF
    This paper is concerned with the average running time of Batcher's odd-even merge sort when implemented on a collection of processors. We consider the case where nn, the size of the input, is an arbitrary multiple of the number pp of processors used. We show that Batcher's odd-even merge (for two sorted lists of length nn each) can be implemented to run in time O((n/p)(log(2+p2/n)))O((n/p)(\log (2+p^2/n))) on the average, and that odd-even merge sort can be implemented to run in time O((n/p)(logn+logplog(2+p2/n)))O((n/p)(\log n+\log p\log (2+p^2/n))) on the average. In the case of merging (sorting), the average is taken over all possible outcomes of the merging (all possible permutations of nn elements). That means that odd-even merge and odd-even merge sort have an optimal average running time if np2n\geq p^2. The constants involved are also quite small

    An empirical evaluation of High-Level Synthesis languages and tools for database acceleration

    Get PDF
    High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.Peer ReviewedPostprint (author’s final draft

    An FPGA Implementation of Kak's Instantaneously-Trained, Fast-Classification Neural Networks

    Get PDF
    Motivated by a biologically plausible short-memory sketchpad, Kak's Fast Classification (FC) neural networks are instantaneously trained by using a prescriptive training scheme. Both weights and the topology for an FC network are specified with only two presentations of the training samples. Compared with iterative learning algorithms such as Backpropagation (which may require many thousands of presentations of the training data), the training of FC networks is extremely fast and learning convergence is always guaranteed. Thus FC networks are suitable for applications where real-time classification and adaptive filtering are needed. In this paper we show that FC networks are "hardware friendly" for implementation on FPGAs. Their unique prescriptive learning scheme can be integrated with the hardware design of the FC network through parameterization and compile-time constant folding

    A taxonomy of parallel sorting

    Get PDF
    TR 84-601In this paper, we propose a taxonomy of parallel sorting that includes a broad range of array and file sorting algorithms. We analyze the evolution of research on parallel sorting, from the earliest sorting networks to the shared memory algorithms and the VLSI sorters. In the context of sorting networks, we describe two fundamental parallel merging schemes - the odd-even and the bitonic merge. Sorting algorithms have been derived from these merging algorithms for parallel computers where processors communicate through interconnection networks such as the perfect shuffle, the mesh and a number of other sparse networks. After describing the network sorting algorithms, we show that, with a shared memory model of parallel computation, faster algorithms have been derived from parallel enumeration sorting schemes, where keys are first ranked and then rearranged according to their rank
    corecore