Search CORE

6 research outputs found

Impact of mixed-parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms

Author
Publication venue: 'Wiley'
Publication date
Field of study

Graph Expansion and Communication Costs of Fast Matrix Multiplication

Author: Ballard Grey
Demmel James
Holtz Olga
Schwartz Oded
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

The communication cost of algorithms (also known as I/O-complexity) is shown to be closely related to the expansion properties of the corresponding computation graphs. We demonstrate this on Strassen's and other fast matrix multiplication algorithms, and obtain first lower bounds on their communication costs. In the sequential case, where the processor has a fast memory of size

M

, too small to store three

n

-by-

n

matrices, the lower bound on the number of words moved between fast and slow memory is, for many of the matrix multiplication algorithms,

\Omega((\frac{n}{\sqrt M})^{\omega_0}\cdot M)

, where

\omega_0

is the exponent in the arithmetic count (e.g.,

\omega_0 = \lg 7

for Strassen, and

\omega_0 = 3

for conventional matrix multiplication). With

p

parallel processors, each with fast memory of size

M

, the lower bound is

p

times smaller. These bounds are attainable both for sequential and for parallel algorithms and hence optimal. These bounds can also be attained by many fast algorithms in linear algebra (e.g., algorithms for LU, QR, and solving the Sylvester equation)

arXiv.org e-Print Archive

CiteSeerX

Crossref

Impact of mixed-parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms

Author: Desprez Frédéric
Suter Frédéric
Publication venue: 'Wiley'
Publication date: 26/04/2004
Field of study

In this paper we study the impact of the simultaneous exploitation of data- and task-parallelism, so called mixed-parallelism, on the Strassen and Winograd matrix multiplication algorithms. This work takes place in the context of Grid computing and, in particular, in the Client-Agent(s)-Server(s) model, where data can already be distributed on the platform. For each of those algorithms, we propose two mixed-parallel implementations. The former follows the phases of the original algorithms while the latter has been designed as the result of a list scheduling algorithm. We give a theoretical comparison, in terms of memory usage and execution time, between our algorithms and classical data-parallel implementations. This analysis is corroborated by experiments. Finally, we give some hints about heterogeneous and recursive versions of our algorithm

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Proceedings of the First International Conference on Software and Data Technologies, Volume 1

Author: Filipe Joaquim
Publication venue: INSTICC PRESS
Publication date: 11/09/2006
Field of study

University of Twente Research Information