138 research outputs found

    A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake

    Full text link
    The modern CPU's design, which is composed of hierarchical memory and SIMD/vectorization capability, governs the potential for algorithms to be transformed into efficient implementations. The release of the AVX-512 changed things radically, and motivated us to search for an efficient sorting algorithm that can take advantage of it. In this paper, we describe the best strategy we have found, which is a novel two parts hybrid sort, based on the well-known Quicksort algorithm. The central partitioning operation is performed by a new algorithm, and small partitions/arrays are sorted using a branch-free Bitonic-based sort. This study is also an illustration of how classical algorithms can be adapted and enhanced by the AVX-512 extension. We evaluate the performance of our approach on a modern Intel Xeon Skylake and assess the different layers of our implementation by sorting/partitioning integers, double floating-point numbers, and key/value pairs of integers. Our results demonstrate that our approach is faster than two libraries of reference: the GNU \emph{C++} sort algorithm by a speedup factor of 4, and the Intel IPP library by a speedup factor of 1.4.Comment: 8 pages, research pape

    Optimized Merge Sort on Modern Commodity Multi-core CPUs

    Get PDF
    Sorting is a kind of widely used basic algorithms. As the high performance computing devices are increasingly common, more and more modern commodity machines have the capability of parallel concurrent computing. A new implementation of sorting algorithms is proposed to harness the power of newer SIMD operations and multi-core computing provided by modern CPUs. The algorithm is hybrid by optimized bitonic sorting network and multi-way merge. New SIMD instructions provided by modern CPUs are used in the bitonic network implementation, which adopted a different method to arrange data so that the number of SIMD operations is reduced. Balanced binary trees are used in multi-way merge, which is also different with former implementations. Efforts are also paid on minimizing data moving in memory since merge sort is a kind of memory-bound application. The performance evaluation shows that the proposed algorithm is twice as fast as the sort function in C++ standard library when only single thread is used. It also outperforms radix sort implemented in Boost library

    Spiking Neural P Systems – A Natural Model for Sorting Networks

    Get PDF
    This paper proposes two simulations of sorting networks with spiking neural P systems. A comparison between different models is also made

    An FPGA Implementation of Kak's Instantaneously-Trained, Fast-Classification Neural Networks

    Get PDF
    Motivated by a biologically plausible short-memory sketchpad, Kak's Fast Classification (FC) neural networks are instantaneously trained by using a prescriptive training scheme. Both weights and the topology for an FC network are specified with only two presentations of the training samples. Compared with iterative learning algorithms such as Backpropagation (which may require many thousands of presentations of the training data), the training of FC networks is extremely fast and learning convergence is always guaranteed. Thus FC networks are suitable for applications where real-time classification and adaptive filtering are needed. In this paper we show that FC networks are "hardware friendly" for implementation on FPGAs. Their unique prescriptive learning scheme can be integrated with the hardware design of the FC network through parameterization and compile-time constant folding

    Optimizing Quadratic Functions over the Set of Permutations

    Get PDF
    Seriation problem is widely used in many fields like archeology\cite{robinson1951method} and shotgun gene sequencing\cite{garriga2011banded,meidanis1998consecutive}. It aims to reorder a linear permutation based on given similarity information and it is an optimization problem over the set of permutation. Due to the large size of feasible set and the variable type, the seriation is an NP-hard quadratic mixed integer programming(MIP) problem. In order to solve this problem efficiently, a construction proposed recently by Goemans\cite{goemans2015smallest}, sorting network is used to constrain the solutions of the problem to be permutation and reformulate the problem. And we solve the MIP problem using heuristic method and branch and bound method and compare their performance

    Simulating the Bitonic Sort on a 2D-mesh with P Systems

    Get PDF
    This paper gives a version of the parallel bitonic sorting algorithm of Batcher, which can sort N elements in time O(log2 N). When applying it to the 2D mesh architecture, two indexing functions are considered, row-major and shuffled row- major. Some properties are proved for the later, together with a correctness proof of the proposed algorithm. Two simulations with P systems are proposed and discussed. The first one uses dynamic communication graphs and follows the guidelines of the mesh version of the algorithm. The second simulation requires only symbol rewriting rules in one membrane

    Sorting Omega Networks Simulated with P Systems: Optimal Data Layouts

    Get PDF
    The paper introduces some sorting networks and their simulation with P systems, in which each processor/membrane can hold more than one piece of data, and perform operations on them internally. Several data layouts are discussed in this context, and an optimal one is proposed, together with its implementation as a P system with dynamic communication graphs
    corecore