Search CORE

35,553 research outputs found

An efficient parallel algorithm for the all pairs shortest path problem using processor arrays with reconfigurable bus systems

Author: Chaudhari N. S.
Fehr Elfriede
Wankar Rajeev
Publication venue
Publication date: 01/01/1999
Field of study

The all pairs shortest path problem is a class of the algebraic path problem. Many parallel algorithms for the solution of this problem appear in the literature. One of the efficient parallel algorithms on W-RAM model is given by Kucera [17]. Though efficient, algorithms written for the W-RAM model of parallel computation are too idealistic to be implemented on the current hardware. In this report we present an efficient parallel algorithm for the solution of this problem using a relatively new model of parallel computing, Processor Arrays with Reconfigurable Bus Systems. The parallel time complexity of this algorithm is O(log2 n) and processors complexity is n2 × n × n

Institutional Repository of the Freie Universität Berlin

CiteSeerX

Analysing the Performance of Divide-and-Conquer Algorithms on Multicore Processors

Author: Dash Sibani Prasad
Dora Kangala Pradeep Kumar
Publication venue
Publication date: 01/01/2013
Field of study

Multicore systems are widely gaining popularity because of the significant avail-ability and performance increase over the single core systems. Multicore systems have a lesser power consumption and heat generation than that of the multiple single core systems. The different compiler support provided by different vendors also make multicore programming one of the main area of research. The multicore programming utilises the power of multiple cores to parallelise a task. The widely used algorithm paradigms for multicore programming are the Divide and Conquer algorithms. The divide and conquer algorithms are candidate problem for the multicore programming because divide and conquer algorithm divides a problem into sub- problems which can be solved by distributing the sub-problems among the different cores and parallel solve them. A wide range of divide and conquer algorithm has been parallelized. In this paper, we have taken two of the widely used divide and conquer algorithms, quick sort and convex hull, parallel implemented them to analyse their performance gain in compared to the sequential version of the algorithm. The parallel implementations distribute the load onto the multiple cores, parallel work upon the loads and finally merge individual results of the each core. We have also proposed a scheme for efficient merging of the parallel sorted sub-arrays in the quick sort. We have taken the mean and standard deviation theory for efficient merging of the sorted sub-arrays. The OpenMP programming model has been used for the implementation of the programs. The processor architecture used for analysing the behaviour of the algorithm is a shared memory based processo

ethesis@nitr

Architectures for block Toeplitz systems

Author: Bouras Ilias
Glentis George-Othon
Kalouptsidis Nicholas
Publication venue: Elsevier
Publication date: 01/01/1995
Field of study

In this paper efficient VLSI architectures of highly concurrent algorithms for the solution of block linear systems with Toeplitz or near-to-Toeplitz entries are presented. The main features of the proposed scheme are the use of scalar only operations, multiplications/divisions and additions, and the local communication which enables the development of wavefront array architecture. Both the mean squared error and the total squared error formulations are described and a variety of implementations are given

CiteSeerX

University of Twente Research Information

A Bulk-Parallel Priority Queue in External Memory with STXXL

Author: GS Brodal
J Singler
JS Vitter
L Arge
MC Pinotti
N Deo
P Sanders
P Sanders
PJ Varman
R Dementiev
Publication venue
Publication date: 01/01/2015
Field of study

We propose the design and an implementation of a bulk-parallel external memory priority queue to take advantage of both shared-memory parallelism and high external memory transfer speeds to parallel disks. To achieve higher performance by decoupling item insertions and extractions, we offer two parallelization interfaces: one using "bulk" sequences, the other by defining "limit" items. In the design, we discuss how to parallelize insertions using multiple heaps, and how to calculate a dynamic prediction sequence to prefetch blocks and apply parallel multiway merge for extraction. Our experimental results show that in the selected benchmarks the priority queue reaches 75% of the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape

arXiv.org e-Print Archive

Crossref

KITopen

Efficient Representation of Computational Meshes

Author: Logg Anders
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2007
Field of study

We present a simple yet general and efficient approach to representation of computational meshes. Meshes are represented as sets of mesh entities of different topological dimensions and their incidence relations. We discuss a straightforward and efficient storage scheme for such mesh representations and efficient algorithms for computation of arbitrary incidence relations from a given initial and minimal set of incidence relations. The general representation may harbor a wide range of computational meshes, and may also be specialized to provide simple user interfaces for particular meshes, including simplicial meshes in one, two and three space dimensions where the mesh entities correspond to vertices, edges, faces and cells. It is elaborated on how the proposed concepts and data structures may be used for assembly of variational forms in parallel over distributed finite element meshes. Benchmarks are presented to demonstrate efficiency in terms of CPU time and memory usage

arXiv.org e-Print Archive

CiteSeerX

Crossref

Chalmers Research

Chalmers Publication Library