Search CORE

103,547 research outputs found

Asynchronous parallel branch and bound and anomalies

Author: Bruin A. (Arie) de
Kindervater G.A.P. (Gerard)
Trienekens H.W.J.M.
Publication venue
Publication date: 01/01/1995
Field of study

The parallel execution of branch and bound algorithms can result in seemingly unreasonable speedups or slowdowns. Almost never the speedup is equal to the increase in computing power. For synchronous parallel branch and bound, these effects have been studiedd extensively. For asynchronous parallelizations, only little is known. In this paper, we derive sufficient conditions to guarantee that an asynchronous parallel branch and bound algorithm (with elimination by lower bound tests and dominance) will be at least as fast as its sequential counterpart. The technique used for obtaining the results seems to be more generally applicable. The essential observations are that, under certain conditions, the parallel algorithm will always work on at least one node, that is branched from by the sequential algorithm, and that the parallel algorithm, after elimination of all such nodes, is able to conclude that the optimal solution has been found. Finally, some of the theoretical results are brought into connection with a few practical experiments

CiteSeerX

EUR Research Repository

Erasmus University Digital Repository

Three-Level Parallel J-Jacobi Algorithms for Hermitian Matrices

Author: Aleksandar Ušćumlić
Bečka
Bojanczyk
Brent
Bunch
Bunch
Davor Davidović
Demmel
Dopico
Drmač
Eberlein
Hansen
Hari
Hari
Higham
Krešimir Bokulić
Luk
Luk
Okša
Parlett
Royo
Rutishauser
Sanja Singer
Saša Singer
Shroff
Singer
Singer
Slapničar
Slapničar
van der Sluis
Vedran Novaković
Veselić
Whiteside
Zha
Zhou
Publication venue: 'Elsevier BV'
Publication date: 24/08/2010
Field of study

The paper describes several efficient parallel implementations of the one-sided hyperbolic Jacobi-type algorithm for computing eigenvalues and eigenvectors of Hermitian matrices. By appropriate blocking of the algorithms an almost ideal load balancing between all available processors/cores is obtained. A similar blocking technique can be used to exploit local cache memory of each processor to further speed up the process. Due to diversity of modern computer architectures, each of the algorithms described here may be the method of choice for a particular hardware and a given matrix size. All proposed block algorithms compute the eigenvalues with relative accuracy similar to the original non-blocked Jacobi algorithm.Comment: Submitted for publicatio

arXiv.org e-Print Archive

CiteSeerX

Crossref

FAMENA Repository

Full-text Institutional Repository of the Ruđer Bošković Institute

Multi-threading a state-of-the-art maximum clique algorithm

Author: McCreesh C.
Prosser P.
Publication venue: 'MDPI AG'
Publication date: 01/01/2013
Field of study

We present a threaded parallel adaptation of a state-of-the-art maximum clique algorithm for dense, computationally challenging graphs. We show that near-linear speedups are achievable in practice and that superlinear speedups are common. We include results for several previously unsolved benchmark problems

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

Enlighten

Recommended from our members

Solving large scale linear programming

Author: Hafsteinsson H
Levkovitz R
Mitra G
Publication venue: Brunel University
Publication date: 01/01/1993
Field of study

The interior point method (IPM) is now well established as a competitive technique for solving very large scale linear programming problems. The leading variant of the interior point method is the primal dual - predictor corrector algorithm due to Mehrotra. The main computational steps of this algorithm are the repeated calculation and solution of a large sparse positive definite system of equations. We describe an implementation of the predictor corrector IPM algorithm on MasPar, a massively parallel SIMD computer. At the heart of the implemen-tation is a parallel Cholesky factorization algorithm for sparse matrices. Our implementation uses a new scheme of mapping the matrix onto the processor grid of the MasPar, that results in a more efficient Cholesky factorization than previously suggested schemes. The IPM implementation uses the parallel unit of MasPar to speed up the factorization and other computationally intensive parts of the IPM. An impor-tant part of this implementation is the judicious division of data and computation between the front-end computer, that runs the main IPM algorithm, and the par-allel unit. Performanc

Brunel University Research Archive

Parallelizing RRT on large-scale distributed-memory architectures

Author: Cortés Juan
Devaurs Didier
Siméon Thierry
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2013
Field of study

This paper addresses the problem of parallelizing the Rapidly-exploring Random Tree (RRT) algorithm on large-scale distributed-memory architectures, using the Message Passing Interface. We compare three parallel versions of RRT based on classical parallelization schemes. We evaluate them on different motion planning problems and analyze the various factors influencing their performance

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

HAL-INSA Toulouse

Parallel Algorithms for Generating Random Networks with Given Degree Sequences

Author: Alam Maksudul
Khan Maleq
Publication venue
Publication date: 25/05/2015
Field of study

Random networks are widely used for modeling and analyzing complex processes. Many mathematical models have been proposed to capture diverse real-world networks. One of the most important aspects of these models is degree distribution. Chung--Lu (CL) model is a random network model, which can produce networks with any given arbitrary degree distribution. The complex systems we deal with nowadays are growing larger and more diverse than ever. Generating random networks with any given degree distribution consisting of billions of nodes and edges or more has become a necessity, which requires efficient and parallel algorithms. We present an MPI-based distributed memory parallel algorithm for generating massive random networks using CL model, which takes

O(\frac{m+n}{P}+P)

time with high probability and

O(n)

space per processor, where

n

m

, and

P

are the number of nodes, edges and processors, respectively. The time efficiency is achieved by using a novel load-balancing algorithm. Our algorithms scale very well to a large number of processors and can generate massive power--law networks with one billion nodes and

250

billion edges in one minute using

1024

processors.Comment: Accepted in NPC 201

arXiv.org e-Print Archive

CiteSeerX