Search CORE

58,655 research outputs found

A Lower Bound Technique for Communication in BSP

Author: Bilardi Gianfranco
Scquizzato Michele
Silvestri Francesco
Publication venue
Publication date: 25/11/2017
Field of study

Communication is a major factor determining the performance of algorithms on current computing systems; it is therefore valuable to provide tight lower bounds on the communication complexity of computations. This paper presents a lower bound technique for the communication complexity in the bulk-synchronous parallel (BSP) model of a given class of DAG computations. The derived bound is expressed in terms of the switching potential of a DAG, that is, the number of permutations that the DAG can realize when viewed as a switching network. The proposed technique yields tight lower bounds for the fast Fourier transform (FFT), and for any sorting and permutation network. A stronger bound is also derived for the periodic balanced sorting network, by applying this technique to suitable subnetworks. Finally, we demonstrate that the switching potential captures communication requirements even in computational models different from BSP, such as the I/O model and the LPRAM

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Recommended from our members

An improved connectionist activation function for energy minimization

Author: Dechter Rina
Pinkas Gadi
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

Symmetric networks that are based on energy minimization, such as Boltzmann machines or Hopfield nets, are used extensively for optimization, constraint satisfaction, and approximation of NP-hard problems. Nevertheless, finding a global minimum for the energy function is not guaranteed, and even a local minimum may take an exponential number of steps. We propose an improvement to the standard activation function used for such networks. The improved algorithm guarantees that a global minimum is found in linear time for tree-like subnetworks. The algorithm is uniform and does not assume that the network is a tree. It performs no worse than the standard algorithms for any network topology. In the case where there are trees growing from a cyclic subnetwork, the new algorithm performs better than the standard algorithms by avoiding local minima along the trees and by optimizing the free energy of these trees in linear time. The algorithm is self-stabilizing for trees (cycle-free undirected graphs) and remains correct under various scheduling demons. However, no uniform protocol exists to optimize trees under a pure distributed demon and no such protocol exists for cyclic networks under central demon

eScholarship - University of California

A survey of parallel execution strategies for transitive closure and logic programs

Author: Cacace F.
Ceri S.
Houtsma M.A.W.
Publication venue: Kluwer Academic Publishers
Publication date: 01/01/1993
Field of study

An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods

CiteSeerX

University of Twente Research Information

A study of the communication cost of the FFT on torus multicomputers

Author: Díaz de Cerio Ripalda Luis Manuel
González Colás Antonio María
Valero García Miguel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Three-Level Parallel J-Jacobi Algorithms for Hermitian Matrices

Author: Aleksandar Ušćumlić
Bečka
Bojanczyk
Brent
Bunch
Bunch
Davor Davidović
Demmel
Dopico
Drmač
Eberlein
Hansen
Hari
Hari
Higham
Krešimir Bokulić
Luk
Luk
Okša
Parlett
Royo
Rutishauser
Sanja Singer
Saša Singer
Shroff
Singer
Singer
Slapničar
Slapničar
van der Sluis
Vedran Novaković
Veselić
Whiteside
Zha
Zhou
Publication venue: 'Elsevier BV'
Publication date: 24/08/2010
Field of study

The paper describes several efficient parallel implementations of the one-sided hyperbolic Jacobi-type algorithm for computing eigenvalues and eigenvectors of Hermitian matrices. By appropriate blocking of the algorithms an almost ideal load balancing between all available processors/cores is obtained. A similar blocking technique can be used to exploit local cache memory of each processor to further speed up the process. Due to diversity of modern computer architectures, each of the algorithms described here may be the method of choice for a particular hardware and a given matrix size. All proposed block algorithms compute the eigenvalues with relative accuracy similar to the original non-blocked Jacobi algorithm.Comment: Submitted for publicatio

arXiv.org e-Print Archive

CiteSeerX

Crossref

FAMENA Repository

Full-text Institutional Repository of the Ruđer Bošković Institute