    Improving of Quicksort Algorithm performance by sequential thread Or parallel algorithms

    Quicksort is well-know algorithm used for sorting, making O(n log n) comparisons to sort a dataset of n items. Being a divide-and-conquer algorithm, it is easily modified to use parallel computing. The aim of this paper is to evaluate the performance of parallel quicksort algorithm and compare it with theoretical performance analysis. To achieve this we implement a tool to do both sequential and parallel quicksort on randomly generated arrays of different size in several runs to provide us with enough data to draw conclusions about the efficiency of using the capability of modern multicore processors together with algorithms designed to increase the speed of sorting large arrays

    A full parallel Quicksort algorithm for multicore processors

    The problem addressed in this paper is that we want to sort an integer array a[] of length n in parallel on a multi core machine with p cores using Quicksort. Amdahl’s law tells us that the inherent sequential part of any algorithm will in the end dominate and limit the speedup we get from parallelisation. This paper introduces ParaQuick, a full parallel quicksort algorithm for use on an ordinary shared memory multi core machine that has just a few simple statements in its sequential part. It can be seen as an improvement over traditional parallelization of the Quicksort algorithm, where one follows the sequential algorithm and substitute recursive calls with the creation of parallel threads for these calls in the top of the recursion tree. The ParaQuick algorithm, starts with k parallel threads, where k is a multiple of p (here k = 8*p) in a k way partition of the original array with the same pivot value, and hence we get 2k partitioned areas in the first pass. We then calculate where the pivot index, the division between the small and large elements if this had been ordinary sequential Quicksort partition. In full parallel we then swap all small elements to the right of this pivot index with the large elements to the left of this pivot index – these two ‘displaced’ sets are by definition of equal size. We can then recursively with half of the threads now do the left part, and with the other half of the threads the right part (more details and synchronization considerations in the paper). Finally, when there is only one thread left working on one such area, sequential Quicksort and Insertionsort are used, as in the traditional way of doing parallel Quicksort. In the last part of the paper, this new algorithm is empirically tested against two other algorithms and Arrays.sort from the Java library. Five different distributions of the numbers to be sorted end three different machines with p = 2(4 hyper threaded), 4(8) and 32(64) are tested. Finally, conclusions are presented and an explanation is given why this ParaQuick algorithm for large values of n and some distributions is so much faster than a traditional parallelisation

    Applications for Multicore System

    A multi-core processor is a single computing unit with two or more processors (“cores”). These cores are integrated into a single IC for enhanced performance, reduced power consumption and more efficient simultaneous processing of multiple tasks. Homogeneous multi-core systems include only identical cores, whereas heterogeneous multi-core systems have cores that are not identical. Most of the computers and workstations these days have multicore processors. However most software programs are not designed to make use of multi-core processors and hence even though we run these programs on the new machines equipped with multicore processors, we don’t see sizable improvements in application performance. The idea behind improved performance is in parallelizing the code and distributing the work amongst multiple cores, but writing programming logic to achieve this is complex. The conventional model of lock-based parallelism for writing such programs is difficult in use, error-prone and does not always lead to efficient use of the resources but with the help of OpenMP, programmers have enhanced support for parallel programming. In this work I have implemented quicksort algorithm using OpenMP library and analysed the performance in terms of execution time

    Analysing the Performance of Divide-and-Conquer Algorithms on Multicore Processors

    Multicore systems are widely gaining popularity because of the significant avail-ability and performance increase over the single core systems. Multicore systems have a lesser power consumption and heat generation than that of the multiple single core systems. The different compiler support provided by different vendors also make multicore programming one of the main area of research. The multicore programming utilises the power of multiple cores to parallelise a task. The widely used algorithm paradigms for multicore programming are the Divide and Conquer algorithms. The divide and conquer algorithms are candidate problem for the multicore programming because divide and conquer algorithm divides a problem into sub- problems which can be solved by distributing the sub-problems among the different cores and parallel solve them. A wide range of divide and conquer algorithm has been parallelized. In this paper, we have taken two of the widely used divide and conquer algorithms, quick sort and convex hull, parallel implemented them to analyse their performance gain in compared to the sequential version of the algorithm. The parallel implementations distribute the load onto the multiple cores, parallel work upon the loads and finally merge individual results of the each core. We have also proposed a scheme for efficient merging of the parallel sorted sub-arrays in the quick sort. We have taken the mean and standard deviation theory for efficient merging of the sorted sub-arrays. The OpenMP programming model has been used for the implementation of the programs. The processor architecture used for analysing the behaviour of the algorithm is a shared memory based processo

    Parallel Divide and Conquer

    We develop a generic divide and conquer algorithm for a parallel tree machine. From the generic algorithm we derive balanced, parallel versions of quicksort and the fast Fourier transform by substitution of data types, variables and statements. The performance of these algorithms is analyzed and measured on a Computing Surface configured as a tree machine with distributed memory

    Analysis and design of parallel algorithms

    The present state of electronic technology is such that factors affecting computation speed have almost been minimised; switching for instance is almost instantaneous. Electronic components are so good, in fact, that the time taken for a logic signal to travel between two points is now a significant factor of instruction times. Clearly, with the actual physical size of components being very small and the high circuit density, there is little scope for improving computation speech significantly by such means as even denser circuitry or still faster electronic components. Thus, development of faster computers will require a new approach that depends on the imaginative use of existing knowledge. One such approach is to increase computation speed through parallelism. Obviously, a parallel computer with p identical processors is potentially p times as fast as a single computer, although this limit can rarely be achieved