6 research outputs found
A full parallel Quicksort algorithm for multicore processors
The problem addressed in this paper is that we want to sort an integer array a[] of length n in parallel on a multi core machine with p cores using Quicksort. Amdahlâs law tells us that the inherent sequential part of any algorithm will in the end dominate and limit the speedup we get from parallelisation. This paper introduces ParaQuick, a full parallel quicksort algorithm for use on an ordinary shared memory multi core machine that has just a few simple statements in its sequential part. It can be seen as an improvement over traditional parallelization of the Quicksort algorithm, where one follows the sequential algorithm and substitute recursive calls with the creation of parallel threads for these calls in the top of the recursion tree. The ParaQuick algorithm, starts with k parallel threads, where k is a multiple of p (here k = 8*p) in a k way partition of the original array with the same pivot value, and hence we get 2k partitioned areas in the first pass. We then calculate where the pivot index, the division between the small and large elements if this had been ordinary sequential Quicksort partition. In full parallel we then swap all small elements to the right of this pivot index with the large elements to the left of this pivot index â these two âdisplacedâ sets are by definition of equal size. We can then recursively with half of the threads now do the left part, and with the other half of the threads the right part (more details and synchronization considerations in the paper). Finally, when there is only one thread left working on one such area, sequential Quicksort and Insertionsort are used, as in the traditional way of doing parallel Quicksort. In the last part of the paper, this new algorithm is empirically tested against two other algorithms and Arrays.sort from the Java library. Five different distributions of the numbers to be sorted end three different machines with p = 2(4 hyper threaded), 4(8) and 32(64) are tested. Finally, conclusions are presented and an explanation is given why this ParaQuick algorithm for large values of n and some distributions is so much faster than a traditional parallelisation
A faster all parallel Mergesort algorithm for multicore processors
The problem addressed in this paper is that we want to sort an integer array a[] of length n in parallel on a multi core machine with p cores using mergesort. Amdahlâs law tells us that the inherent sequential part of any algorithm will in the end dominate and limit the speedup we get from parallelisation. This paper introduces ParaMerge, an all parallel mergesort algorithm for use on an ordinary shared memory multi core machine that has just a few simple statements in its sequential part. The new algorithm is all parallel in the sense that by recursive decent it is two parallel in the top node, four parallel on the next level in the recursion, then eight parallel until we at least have started one thread for all the p cores. After parallelization, each thread then uses sequential recursion mergesort with a variant of insertion sort for sorting short subsections at the end. ParaMerge can be seen as an improvement over traditional parallelization of the mergesort algorithm where one follows the sequential algorithm and substitute recursive calls with the creation of parallel threads in the top of the recursion tree. This traditional parallel mergesort finally does a sequential merging of the two sorted halves of a[]. First at the next level it goes two-parallel, then four parallel on the next level, and so on. After parallelization my implementation of this traditional algorithm also use the same sequential mergesort and insertion sort algorithm as the ParaMerge algorithm in each thread. There are two main improvements in Paramerge: First the observation that merging can both be done from the start left to right picking the smallest elements of the two sections to be merged, and at the same time from the end of the same sections from right to left picking the largest elements. The second improvement is that the contract between a node and its two sub-nodes is changed. In a traditional parallelization a node is given a section of a[], sort this by merging two sorted halves it recursively receives from its own two sub nodes and returns its to its mother node. In Paramerge the two sub nodes each receive a full sorting from its two own sub nodes of the section itself got from its mother node (so this problem is already solved). Every node has a twin node. In parallel these two twin nodes then merge their two sorted sections, one from left and the other from right as described above. The two twin sub nodes have then sorted the whole section given to their common mother node. This goes also for the top node. We have thus raised the level of parallelization by a factor of two at each level of the top of the recursion tree. The ParaMerge algorithm also contains other improvements, such as a controlled sorting back and forth between a[] and a scratch area b[] of the same size such that the sorted result always ends up in a[] without any copy, and a special insertion sort that is central for achieving this copy-free feature. ParaMerge is compared with other published algorithms, and in only one case is one of the ânewâ features in Paramerge found. This other algorithm is described and compared in some detail.Finally, ParaMerge is empirically compared with three other algorithms sorting arrays of length n =10,20,âŠ,50 m, and ..1000m when p=32. demonstrating that it is significantly faster than two other merge algorithms, the sequential and the traditional parallel algorithm, and Arrays.sort(), a sequential Quicksort algorithm from the Java library
Recommended from our members
Resource Contention in Real-time Systems
The divideâandâconquer method is extensively used for system design. For real-time systems the separated components execute concurrently using some common computational infrastructure and this can lead to contention for system resources, such as processors, memory, communication channels, and so on. Unless the resource contention is accommodated, then a system built from the composition of components may not function as expected and the âprovenâ behaviour of the components can be invalid. To overcome this uncertainty a divideâconquerâandâsystem-composition method is required. This thesis takes a different approach to many of the existing notations which focus on descriptions of behaviour. The Composite Transition System notation and algebra presented here enables the resource usage of the components to be specified and combined to form a composite system of concurrently executing components. By relating the composite system to the realisable behaviour of the system resources provided by the common infrastructure it becomes possible to determine any violation of the constraints imposed by the system resources. If the composite system model is then constrained by the resource behaviours then it is possible through an extraction operation to determine the modified behaviour of the components that will yield a system free of resource contention. Component specification, concurrent composition, the application of system level constraints and extraction are applied in this thesis to a system encountered in a commercial application. The purpose of this example is to demonstrate contention modelling and the mathematics of the notation, rather than to prove any specific properties of the application. Deployment of the notation to more complex applications will require the development of software tools to compute concurrent composition and extraction, and this is the motivation for the mathematical treatment in this thesis
Ensuring Serializable Executions with Snapshot Isolation DBMS
Snapshot Isolation (SI) is a multiversion concurrency control that has been implemented by open source and commercial database systems such as PostgreSQL and Oracle. The main feature of SI is that a read operation does not block a write operation and vice versa, which allows higher degree of concurrency than traditional two-phase locking. SI prevents many anomalies that appear in other isolation levels, but it still can result in non-serializable execution, in which database integrity constraints can be violated. Several techniques have been proposed to ensure serializable execution with engines running SI; these techniques are based on modifying the applications by introducing conflicting SQL statements. However, with each of these techniques the DBA has to make a difficult choice among possible transactions to modify. This thesis helps the DBAâs to choose between these different techniques and choices by understanding how the choices affect system performance. It also proposes a novel technique called âExternal Lock Managerâ (ELM) which introduces conflicts in a separate lock-manager object so that every execution will be serializable. We build a prototype system for ELM and we run experiments to demonstrate the robustness of the new technique compare to the previous techniques. Experiments show that modifying the application code for some transactions has a high impact on performance for some choices, which makes it very hard for DBAâs to choose wisely. However, ELM has peak performance which is similar to SI, no matter which transactions are chosen for modification. Thus we say that ELM is a robust technique for ensure serializable execution
Ensuring Serializable Executions with Snapshot Isolation DBMS
Snapshot Isolation (SI) is a multiversion concurrency control that has been implemented by open source and commercial database systems such as PostgreSQL and Oracle. The main feature of SI is that a read operation does not block a write operation and vice versa, which allows higher degree of concurrency than traditional two-phase locking. SI prevents many anomalies that appear in other isolation levels, but it still can result in non-serializable execution, in which database integrity constraints can be violated. Several techniques have been proposed to ensure serializable execution with engines running SI; these techniques are based on modifying the applications by introducing conflicting SQL statements. However, with each of these techniques the DBA has to make a difficult choice among possible transactions to modify. This thesis helps the DBAâs to choose between these different techniques and choices by understanding how the choices affect system performance. It also proposes a novel technique called âExternal Lock Managerâ (ELM) which introduces conflicts in a separate lock-manager object so that every execution will be serializable. We build a prototype system for ELM and we run experiments to demonstrate the robustness of the new technique compare to the previous techniques. Experiments show that modifying the application code for some transactions has a high impact on performance for some choices, which makes it very hard for DBAâs to choose wisely. However, ELM has peak performance which is similar to SI, no matter which transactions are chosen for modification. Thus we say that ELM is a robust technique for ensure serializable execution