Search CORE

6,255 research outputs found

Message lower bounds via efficient network synchronization

Author: Pandurangan Gopal
Peleg David
Scquizzato Michele
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Archivio istituzionale della ricerca - Università di Padova

A Lower Bound Technique for Communication in BSP

Author: Bilardi Gianfranco
Scquizzato Michele
Silvestri Francesco
Publication venue
Publication date: 25/11/2017
Field of study

Communication is a major factor determining the performance of algorithms on current computing systems; it is therefore valuable to provide tight lower bounds on the communication complexity of computations. This paper presents a lower bound technique for the communication complexity in the bulk-synchronous parallel (BSP) model of a given class of DAG computations. The derived bound is expressed in terms of the switching potential of a DAG, that is, the number of permutations that the DAG can realize when viewed as a switching network. The proposed technique yields tight lower bounds for the fast Fourier transform (FFT), and for any sorting and permutation network. A stronger bound is also derived for the periodic balanced sorting network, by applying this technique to suitable subnetworks. Finally, we demonstrate that the switching potential captures communication requirements even in computational models different from BSP, such as the I/O model and the LPRAM

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

On Characterizing the Data Movement Complexity of Computational DAGs for Parallel Execution

Author: Elango Venmugil
Pouchet Louis-Noël
Ramanujam J.
Rastello Fabrice
Sadayappan P.
Publication venue
Publication date: 01/01/2014
Field of study

Technology trends are making the cost of data movement increasingly dominant, both in terms of energy and time, over the cost of performing arithmetic operations in computer systems. The fundamental ratio of aggregate data movement bandwidth to the total computational power (also referred to the machine balance parameter) in parallel computer systems is decreasing. It is there- fore of considerable importance to characterize the inherent data movement requirements of parallel algorithms, so that the minimal architectural balance parameters required to support it on future systems can be well understood. In this paper, we develop an extension of the well-known red-blue pebble game to develop lower bounds on the data movement complexity for the parallel execution of computational directed acyclic graphs (CDAGs) on parallel systems. We model multi-node multi-core parallel systems, with the total physical memory distributed across the nodes (that are connected through some interconnection network) and in a multi-level shared cache hierarchy for processors within a node. We also develop new techniques for lower bound characterization of non-homogeneous CDAGs. We demonstrate the use of the methodology by analyzing the CDAGs of several numerical algorithms, to develop lower bounds on data movement for their parallel execution

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

On the impact of communication complexity in the design of parallel numerical algorithms

Author: Gannon D.
Vanrosendale J.
Publication venue
Publication date
Field of study

This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation

NASA Technical Reports Server

Communication Steps for Parallel Query Processing

Author: Beame Paul
Koutris Paraschos
Suciu Dan
Publication venue
Publication date: 01/01/2013
Field of study

We consider the problem of computing a relational query

q

on a large input database of size

n

, using a large number

p

of servers. The computation is performed in rounds, and each server can receive only

O(n/p^{1-\varepsilon})

bits of data, where

\varepsilon \in [0,1]

is a parameter that controls replication. We examine how many global communication steps are needed to compute

q

. We establish both lower and upper bounds, in two settings. For a single round of communication, we give lower bounds in the strongest possible model, where arbitrary bits may be exchanged; we show that any algorithm requires

\varepsilon \geq 1-1/\tau^*

, where

\tau^*

is the fractional vertex cover of the hypergraph of

q

. We also give an algorithm that matches the lower bound for a specific class of databases. For multiple rounds of communication, we present lower bounds in a model where routing decisions for a tuple are tuple-based. We show that for the class of tree-like queries there exists a tradeoff between the number of rounds and the space exponent

\varepsilon

. The lower bounds for multiple rounds are the first of their kind. Our results also imply that transitive closure cannot be computed in O(1) rounds of communication

arXiv.org e-Print Archive

CiteSeerX

Crossref

Self-Stabilizing Repeated Balls-into-Bins

Author: Becchetti Luca
Clementi Andrea
Natale Emanuele
Pasquale Francesco
Posta Gustavo
Publication venue
Publication date: 01/01/2015
Field of study

We study the following synchronous process that we call "repeated balls-into-bins". The process is started by assigning

n

balls to

n

bins in an arbitrary way. In every subsequent round, from each non-empty bin one ball is chosen according to some fixed strategy (random, FIFO, etc), and re-assigned to one of the

n

bins uniformly at random. We define a configuration "legitimate" if its maximum load is

\mathcal{O}(\log n)

. We prove that, starting from any configuration, the process will converge to a legitimate configuration in linear time and then it will only take on legitimate configurations over a period of length bounded by any polynomial in

n

, with high probability (w.h.p.). This implies that the process is self-stabilizing and that every ball traverses all bins in

\mathcal{O}(n \log^2 n)

rounds, w.h.p

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

MPG.PuRe

Hal-Diderot

Tight Bounds for On-line Tree Embedding

Author: Bhatt Sandeep
Greenberg David
Leighton Tom
Liu Pangfeng
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/1991
Field of study

Many tree–structured computations are inherently parallel. As leaf processes are recursively spawned they can be assigned to independent processors in a multicomputer network. To maintain load balance, an on–line mapping algorithm must distribute processes equitably among processors. Additionally, the algorithm itself must be distributed in nature, and process allocation must be completed via message–passing with minimal communication overhead. This paper investigates bounds on the performance of deterministic and randomized algorithms for on–line tree embedding. In particular, we study tradeoffs between performance (load–balance) and communication overhead (message congest ion). We give a simple technique to derive lower bounds on the congestion that any on–line allocation algorithm must incur in order to guarantee load balance. This technique works for both randomized and deterministic algorithms, although we find that the performance of randomized on-line algorithms to be somewhat better than that of deterministic algorithms. Optimal bounds are achieved for several networks including multi–dimensional grids and butterflies

Caltech Authors