6 research outputs found
Graph Expansion and Communication Costs of Fast Matrix Multiplication
The communication cost of algorithms (also known as I/O-complexity) is shown
to be closely related to the expansion properties of the corresponding
computation graphs. We demonstrate this on Strassen's and other fast matrix
multiplication algorithms, and obtain first lower bounds on their communication
costs.
In the sequential case, where the processor has a fast memory of size ,
too small to store three -by- matrices, the lower bound on the number of
words moved between fast and slow memory is, for many of the matrix
multiplication algorithms, ,
where is the exponent in the arithmetic count (e.g., for Strassen, and for conventional matrix multiplication).
With parallel processors, each with fast memory of size , the lower
bound is times smaller.
These bounds are attainable both for sequential and for parallel algorithms
and hence optimal. These bounds can also be attained by many fast algorithms in
linear algebra (e.g., algorithms for LU, QR, and solving the Sylvester
equation)
Impact of mixed-parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms
In this paper we study the impact of the simultaneous exploitation of data- and task-parallelism, so called mixed-parallelism, on the Strassen and Winograd matrix multiplication algorithms. This work takes place in the context of Grid computing and, in particular, in the Client-Agent(s)-Server(s) model, where data can already be distributed on the platform. For each of those algorithms, we propose two mixed-parallel implementations. The former follows the phases of the original algorithms while the latter has been designed as the result of a list scheduling algorithm. We give a theoretical comparison, in terms of memory usage and execution time, between our algorithms and classical data-parallel implementations. This analysis is corroborated by experiments. Finally, we give some hints about heterogeneous and recursive versions of our algorithm