5 research outputs found
Time-Optimal and Conflict-Free Mappings of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays
Most existing methods of mapping algorithms into processor arrays are restricted to the case where n-dimensional algorithms or algorithms with n nested loops are mapped into (n—l)-dimensional arrays. However, in practice, it is interesting to map n-dimensional algorithms into (k —l)-dimensional arrays where k\u3c.n. For example, many algorithms at bit-level are at least 4-dimensional (matrix multiplication, convolution, LU decomposition, etc.) and most existing bit level processor arrays are 2-dimensional. A computational conflict occurs if two or more computations of an algorithm are mapped into the same processor and the same execution time. In this paper, necessary and sufficient conditions are derived to identify all mappings without computational conflicts, based on the Hermite normal form of the mapping matrix. These conditions are used to propose methods of mapping any n-dimensional algorithm into (k— l)-dimensional arrays, kn—3, optimality of the mapping is guaranteed
Partitioning of Uniform Dependency Algorithms for Parallel Execution on MIMD/ Systolic Systems
An algorithm can be modeled as an index set and a set of dependence vectors. Each index vector in the index set indexes a computation of the algorithm. If the execution of a computation depends on the execution of another computation, then this dependency is represented as the difference between the index vectors of the computations. The dependence matrix corresponds to a matrix where each column is a dependence vector. An independent partition of the index set is such that there are no dependencies between computations that belong to different blocks of the partition. This report considers uniform dependence algorithms with any arbitrary kind of index set and proposes two very simple methods to find independent partitions of the index set. Each method has advantages over the other one for certain kind of application, and they both outperform previously proposed approaches in terms of computational complexity and/or optimality. Also, lower bounds and upper bounds of the cardinality of the maximal independent partitions are given. For some algorithms it is shown that the cardinality of the maximal partition is equal to the greatest common divisor of some subdeterminants of the dependence matrix. In an MIMD/multiple systolic array computation environment, if different blocks of ail independent partition are assigned to different processors/arrays, the communications between processors/arrays will be minimized to zero. This is significant because the communications usually dominate the overhead in MIMD machines. Some issues of mapping partitioned algorithms into MIMD/systolic systems are addressed. Based on the theory of partitioning, a new method is proposed to test if a system of linear Diophantine equations has integer solutions
On the Efficient Design and Implementation of Systolic Structures
Computing and Information Scienc
Recommended from our members
Mapping of recursive algorithms onto multi-rate arrays
In this dissertation, multi-rate array (MRA) architecture and its synthesis are proposed
and developed. Using multi-coordinate systems (MCS), a unified theory for mapping
algorithms from their original algorithmic specifications onto multi-rate arrays is
developed.
A multi-rate array is a grid of processors in which each interconnection may have its
own clock rate; operations with different complexities run at their own clock rate, thus
increasing the throughput and efficiency.
A class of algorithms named directional affine recurrence equations (DARE) is
defined. The dependence space of a DARE can be decomposed into uniform and non-uniform
subspaces. When projected along the non-uniform subspace, the resultant array
structure is regular. Limitations and restrictions of this approach are investigated and a
procedure for mapping DARE onto MRA is developed.
To generalize this approach, synthesis theory is developed with initial specification
as affine direct input output (ADIO) which aims at removing redundancies from algorithms.
Most ADIO specifications are the original algorithmic specifications. A multi-coordinate
systems (MCS) is used to present an algorithm's dependence structures. In a
MCS system, the index spaces of the variables in an algorithm are defined relative to their own coordinate systems. Most traditionally considered irregular algorithms present regular dependence structures under MCS technique. Procedures are provided for transforming algorithms from original algorithmic specifications to their regular specifications.
Multi-rate schedules and multi-rate timing functions are studied. The solution for multi-rate timing functions can be formulated as linear programming problems. Procedures are provided for mapping ADIOs onto multi-rate VLSI systems. Examples are provided to illustrate the synthesis of MRAs from DAREs and ADIOs.
The first major contribution of this dissertation is the development of the concrete, executable MRA architectures. The second is the introduction of MCS system and its application in the development of the theory for synthesizing MRAs from original algorithmic specifications