Search CORE

958 research outputs found

Parallel Performance of MPI Sorting Algorithms on Dual-Core Processor Windows-Based Systems

Author: Elnashar Alaa Ismail
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 30/05/2011
Field of study

Message Passing Interface (MPI) is widely used to implement parallel programs. Although Windowsbased architectures provide the facilities of parallel execution and multi-threading, little attention has been focused on using MPI on these platforms. In this paper we use the dual core Window-based platform to study the effect of parallel processes number and also the number of cores on the performance of three MPI parallel implementations for some sorting algorithms

arXiv.org e-Print Archive

CiteSeerX

Crossref

Recommended from our members

Reliability and fault tolerance modelling of multiprocessor systems

Author: Valdivia Roberto Abraham
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/1989
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Reliability evaluation by analytic modelling constitute an important issue of designing a reliable multiprocessor system. In this thesis, a model for reliability and fault tolerance analysis of the interconnection network is presented, based on graph theory. Reliability and fault tolerance are considered as deterministic and probabilistic measures of connectivity. Exact techniques for reliability evaluation fail for large multiprocessor systems because of the enormous computational resources required. Therefore, approximation techniques have to be used. Three approaches are proposed, the first by simplifying the symbolic expression of reliability; the other two by applying a hierarchical decomposition to the system. All these methods give results close to those obtained by exact techniques.Consejo Nacional de Ciencia y Tecnologia" (National Council for Science and Technology of Mexico) and "Instituto de Investigaciones Electricas" (Institute for Electrical Research

Brunel University Research Archive

The Lock-free $k$ -LSM Relaxed Priority Queue

Author: Gruber Jakob
Träff Jesper Larsson
Tsigas Philippas
Wimmer Martin
Publication venue
Publication date: 01/01/2015
Field of study

Priority queues are data structures which store keys in an ordered fashion to allow efficient access to the minimal (maximal) key. Priority queues are essential for many applications, e.g., Dijkstra's single-source shortest path algorithm, branch-and-bound algorithms, and prioritized schedulers. Efficient multiprocessor computing requires implementations of basic data structures that can be used concurrently and scale to large numbers of threads and cores. Lock-free data structures promise superior scalability by avoiding blocking synchronization primitives, but the \emph{delete-min} operation is an inherent scalability bottleneck in concurrent priority queues. Recent work has focused on alleviating this obstacle either by batching operations, or by relaxing the requirements to the \emph{delete-min} operation. We present a new, lock-free priority queue that relaxes the \emph{delete-min} operation so that it is allowed to delete \emph{any} of the

\rho+1

smallest keys, where

\rho

is a runtime configurable parameter. Additionally, the behavior is identical to a non-relaxed priority queue for items added and removed by the same thread. The priority queue is built from a logarithmic number of sorted arrays in a way similar to log-structured merge-trees. We experimentally compare our priority queue to recent state-of-the-art lock-free priority queues, both with relaxed and non-relaxed semantics, showing high performance and good scalability of our approach.Comment: Short version as ACM PPoPP'15 poste

arXiv.org e-Print Archive

Crossref

Chalmers Research

Towards Distributed Convoy Pattern Mining

Author: Ester M.
Ghemawat S.
Hua K. A.
Kwon Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Mining movement data to reveal interesting behavioral patterns has gained attention in recent years. One such pattern is the convoy pattern which consists of at least m objects moving together for at least k consecutive time instants where m and k are user-defined parameters. Existing algorithms for detecting convoy patterns, however do not scale to real-life dataset sizes. Therefore a distributed algorithm for convoy mining is inevitable. In this paper, we discuss the problem of convoy mining and analyze different data partitioning strategies to pave the way for a generic distributed convoy pattern mining algorithm.Comment: SIGSPATIAL'15 November 03-06, 2015, Bellevue, WA, US

arXiv.org e-Print Archive

Crossref

Institutional Repository Universiteit Antwerpen

HAL Université de Tours

Applications for Multicore System

Author: Prasad Mamta Kumari
Publication venue
Publication date: 01/01/2013
Field of study

A multi-core processor is a single computing unit with two or more processors (“cores”). These cores are integrated into a single IC for enhanced performance, reduced power consumption and more efficient simultaneous processing of multiple tasks. Homogeneous multi-core systems include only identical cores, whereas heterogeneous multi-core systems have cores that are not identical. Most of the computers and workstations these days have multicore processors. However most software programs are not designed to make use of multi-core processors and hence even though we run these programs on the new machines equipped with multicore processors, we don’t see sizable improvements in application performance. The idea behind improved performance is in parallelizing the code and distributing the work amongst multiple cores, but writing programming logic to achieve this is complex. The conventional model of lock-based parallelism for writing such programs is difficult in use, error-prone and does not always lead to efficient use of the resources but with the help of OpenMP, programmers have enhanced support for parallel programming. In this work I have implemented quicksort algorithm using OpenMP library and analysed the performance in terms of execution time

ethesis@nitr

GPU-ArraySort: A parallel, in-place algorithm for sorting large number of arrays

Author: Awan Muaaz
Saeed Fahad
Publication venue: ScholarWorks at WMU
Publication date: 15/08/2016
Field of study

Modern day analytics deals with big datasets from diverse fields. For many application the data is in the form of an array which consists of large number of smaller arrays. Existing techniques focus on sorting a single large array and cannot be used for sorting large number of smaller arrays in an efficient manner. Currently no such algorithm is available which can sort such large number of arrays utilizing the massively parallel architecture of GPU devices. In this paper we present a highly scalable parallel algorithm, called GPU-ArraySort, for sorting large number of arrays using a GPU. Our algorithm performs in-place operations and makes minimum use of any temporary run-time memory. Our results indicate that we can sort up to 2 million arrays having 1000 elements each, within few seconds. We compare our results with the unorthodox tagged array sorting technique based on NVIDIAs Thrust library. GPU-ArraySort out-performs the tagged array sorting technique by sorting three times more data in a much smaller time. The developed tool and strategy will be made available at https://github.com/pcdslab

ScholarWorks at WMU

Algorithms for the NJIT turbonet parallel computer

Author: Lad Nitin J.
Publication venue: Digital Commons @ NJIT
Publication date: 31/10/1995
Field of study

Element selection for arrays, array merging, and sorting are very frequent operations in many of today\u27s important applications. These operations are of interest to scientific, as well as other applications where high-speed database search, merge, and sort operations are necessary and frequent. Therefore, their efficient implementation on parallel computers should be a worthwhile objective. Parallel algorithms are presented in this thesis for the implementation of these operations on the NET TurboNet system, an in-house built experimental parallel computer with TMS320C40 Digital Signal Processors interconnected in a 3-D hypercube structure. The first algorithm considered is selection. It involves finding the k-th smallest element in an unsorted sequence of n elements, where 1≤k≤n. The second algorithm involves the merging of two sequences sorted in nondecreasing order to form a third sequence, also sorted in nondecreasing order. The third parallel algorithm is sorting. For a given unsorted sequence S of size n, we want to sort the sequence such that st\u27≤i+1\u27 for all n elements. Performance results show that the robust structure of TurboNet results in significant speedups

Digital Commons @ New Jersey Institute of Technology (NJIT)