Search CORE

98,018 research outputs found

Communication-optimal Parallel and Sequential Cholesky Decomposition

Author: Grey Ballard
Grey Ballard
James Demmel
James Demmel
Oded Schwartz
Oded Schwartz
Olga Holtz
Olga Holtz
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2009
Field of study

Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case). Communication costs often dominate arithmetic costs, so it is of interest to design algorithms minimizing communication. In this paper we first extend known lower bounds on the communication cost (both for bandwidth and for latency) of conventional (O(n^3)) matrix multiplication to Cholesky factorization, which is used for solving dense symmetric positive definite linear systems. Second, we compare the costs of various Cholesky decomposition implementations to these lower bounds and identify the algorithms and data structures that attain them. In the sequential case, we consider both the two-level and hierarchical memory models. Combined with prior results in [13, 14, 15], this gives a set of communication-optimal algorithms for O(n^3) implementations of the three basic factorizations of dense linear algebra: LU with pivoting, QR and Cholesky. But it goes beyond this prior work on sequential LU by optimizing communication for any number of levels of memory hierarchy.Comment: 29 pages, 2 tables, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Weak scalability analysis of the distributed-memory parallel MLFMA

Author: Bogaert Ignace
De Zutter Daniël
Fostier Jan
Michiels Bart
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Distributed-memory parallelization of the multilevel fast multipole algorithm (MLFMA) relies on the partitioning of the internal data structures of the MLFMA among the local memories of networked machines. For three existing data partitioning schemes (spatial, hybrid and hierarchical partitioning), the weak scalability, i.e., the asymptotic behavior for proportionally increasing problem size and number of parallel processes, is analyzed. It is demonstrated that none of these schemes are weakly scalable. A nontrivial change to the hierarchical scheme is proposed, yielding a parallel MLFMA that does exhibit weak scalability. It is shown that, even for modest problem sizes and a modest number of parallel processes, the memory requirements of the proposed scheme are already significantly lower, compared to existing schemes. Additionally, the proposed scheme is used to perform full-wave simulations of a canonical example, where the number of unknowns and CPU cores are proportionally increased up to more than 200 millions of unknowns and 1024 CPU cores. The time per matrix-vector multiplication for an increasing number of unknowns and CPU cores corresponds very well to the theoretical time complexity

Ghent University Academic Bibliography

Algorithms for Hierarchical and Semi-Partitioned Parallel Scheduling

Author: Bonifaci Vincenzo
Dangelo Gianlorenzo
Marchetti-Spaccamela Alberto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

We propose a model for scheduling jobs in a parallel machine setting that takes into account the cost of migrations by assuming that the processing time of a job may depend on the specific set of machines among which the job is migrated. For the makespan minimization objective, the model generalizes classical scheduling problems such as unrelated parallel machine scheduling, as well as novel ones such as semi-partitioned and clustered scheduling. In the case of a hierarchical family of machines, we derive a compact integer linear programming formulation of the problem and leverage its fractional relaxation to obtain a polynomial-time 2-approximation algorithm. Extensions that incorporate memory capacity constraints are also discussed

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Parallel Algorithms for Geometric Graph Problems

Author: Andoni Alexandr
Nikolov Aleksandar
Onak Krzysztof
Yaroslavtsev Grigory
Publication venue
Publication date: 01/01/2014
Field of study

We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a

(1+\epsilon)

-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time,

n^{1+o_\epsilon(1)}

. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for

(1+\epsilon)

-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a

(1+\epsilon)

-approximation algorithm with

n^{\delta}

space in the streaming-with-sorting model with

1/\delta^{O(1)}

passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem

arXiv.org e-Print Archive

CiteSeerX

Target Assignment in Robotic Networks: Distance Optimality Guarantees and Hierarchical Strategies

Author: Chung Soon-Jo
Voulgaris Petros G.
Yu Jingjin
Publication venue
Publication date: 31/07/2014
Field of study

We study the problem of multi-robot target assignment to minimize the total distance traveled by the robots until they all reach an equal number of static targets. In the first half of the paper, we present a necessary and sufficient condition under which true distance optimality can be achieved for robots with limited communication and target-sensing ranges. Moreover, we provide an explicit, non-asymptotic formula for computing the number of robots needed to achieve distance optimality in terms of the robots' communication and target-sensing ranges with arbitrary guaranteed probabilities. The same bounds are also shown to be asymptotically tight. In the second half of the paper, we present suboptimal strategies for use when the number of robots cannot be chosen freely. Assuming first that all targets are known to all robots, we employ a hierarchical communication model in which robots communicate only with other robots in the same partitioned region. This hierarchical communication model leads to constant approximations of true distance-optimal solutions under mild assumptions. We then revisit the limited communication and sensing models. By combining simple rendezvous-based strategies with a hierarchical communication model, we obtain decentralized hierarchical strategies that achieve constant approximation ratios with respect to true distance optimality. Results of simulation show that the approximation ratio is as low as 1.4

arXiv.org e-Print Archive

CiteSeerX

Caltech Authors