51 research outputs found
Expanded delta networks for very large parallel computers
In this paper we analyze a generalization of the traditional delta network, introduced by Patel [21], and dubbed Expanded Delta Network (EDN). These networks provide in general multiple paths that can be exploited to reduce contention in the network resulting in increased performance. The crossbar and traditional delta networks are limiting cases of this class of networks. However, the delta network does not provide the multiple paths that the more general expanded delta networks provide, and crossbars are to costly to use for large networks. The EDNs are analyzed with respect to their routing capabilities in the MIMD and SIMD models of computation.The concepts of capacity and clustering are also addressed. In massively parallel SIMD computers, it is the trend to put a larger number processors on a chip, but due to I/O constraints only a subset of the total number of processors may have access to the network. This is introduced as a Restricted Access Expanded Delta Network of which the MasPar MP-1 router network is an example
Recommended from our members
The effect of FPU architecture on a dynamic precision algorithm for the solution of differential equations
Solution of lnitial Value Problems (IVPs) is an important application in scientific computing. Methods for solving these problems use techniques for reducing the error and increasing the speed of the computation. This paper introduces a class of algorithms which dynamically reconfigure their operating parameters to reduce the computation time. By dynamically varying the precision of the arithmetic being performed, it is possible to obtain dramatic speedups on certain architectures when solving IVPs. This paper illustrates how various architectures impact on a dynamic precision version of the Runge-Kutta-Fehlberg algorithm. It is shown that a speedup of over 30 percent is possible for both massively parallel processors and vector supercomputers
Recommended from our members
Self-routing lowest common ancestor networks
Multistage interconnection networks (MIN's) allow communication between terminals on opposing sides of a network. Lowest Common Ancestor Networks (LCAN's) [1] have switches capable of connecting bi-directional links in a permutation pattern that additionally permits communication between terminals on the same side. Self-routing LCAN's have interesting permutation routing capabilities and are highly partionable. This paper characterizes self-routing LCAN's and analyzes their permutation routing capabilities. It is shown that the routing network of the CM-5 is a particular instance of an LCAN
Recommended from our members
Shortest paths in orthogonal graphs
Orthogonal graphs were introduced as a simple but powerful tool for the description and analysis of a class of interconnection networks. Routing, and hence finding shortest paths between any two nodes of an orthogonal graph, becomes an important problem. It is shown in this paper that routing in this class of graphs reduces to a node covering problem in the bipartite coverage graph of the orthogonal graph. A minimum cover clearly leads to a shortest path. In general, the problem of finding the mÃnimum node cover in a bipartite graph is NP-complete. However, the bipartite coverage graphs corresponding to orthogonal graphs have a regular pattern of edges. This allows the development of a routing algorithm which results in a minimum cover. The procedure executes in polynomial time in the number of bit-nodes of the bipartite graph. It therefore results in a shortest path algorithm whose time complexity is quadratic in the logarithm of the number of nodes in the original orthogonal graph
Recommended from our members
Lowest common ancestor interconnection networks
Lowest Common Ancestor (LCA) networks are built using switches capable of connecting u + d inputs/outputs in a permutation pattern. For n source nodes and I stages of switches, n/d switches are used in stage l - n/d - u/d in stage l - 2, and in general , n-u^l-i-l/d^l-i switches in stage i. The resulting hierarchical structure possesses interesting connectivity and permutational properties. A full characterization of LCA networks is presented together with a permutation routing algorithm for a family of LCA networks. The algorithm uses the network itself to collect and disseminate information about the permutation. A schedule of O(dp log_d/u n) passes is obtained with a switch set-up cost factor of O(log_d/u n) (p is the minimum number of passes that an algorithm with global knowledge schedules)
Recommended from our members
The effect of FPU architecture on a dynamic precision algorithm for the solution of differential equations
Solution of lnitial Value Problems (IVPs) is an important application in scientific computing. Methods for solving these problems use techniques for reducing the error and increasing the speed of the computation. This paper introduces a class of algorithms which dynamically reconfigure their operating parameters to reduce the computation time. By dynamically varying the precision of the arithmetic being performed, it is possible to obtain dramatic speedups on certain architectures when solving IVPs. This paper illustrates how various architectures impact on a dynamic precision version of the Runge-Kutta-Fehlberg algorithm. It is shown that a speedup of over 30 percent is possible for both massively parallel processors and vector supercomputers
Recommended from our members
Expanded delta networks for very large parallel computers
In this paper we analyze a generalization of the traditional delta network, introduced by Patel [21], and dubbed Expanded Delta Network (EDN). These networks provide in general multiple paths that can be exploited to reduce contention in the network resulting in increased performance. The crossbar and traditional delta networks are limiting cases of this class of networks. However, the delta network does not provide the multiple paths that the more general expanded delta networks provide, and crossbars are to costly to use for large networks. The EDNs are analyzed with respect to their routing capabilities in the MIMD and SIMD models of computation.The concepts of capacity and clustering are also addressed. In massively parallel SIMD computers, it is the trend to put a larger number processors on a chip, but due to I/O constraints only a subset of the total number of processors may have access to the network. This is introduced as a Restricted Access Expanded Delta Network of which the MasPar MP-1 router network is an example
An Analysis of Diffusive Load-Balancing \Lambda
U.S.A. Abstract Diffusion is a well-known algorithm for load-balancing in which tasks move from heavily-loaded processors to lightly-loaded neighbors. This paper presents a rigorous analysis of the performance of the diffusion algorithm on arbitrary networks. We derive both lower and upper bounds on the running time of the algorithm. These bounds are stated in terms of the network's bandwidth. For the case of the generalized mesh with wrap-around (which includes common networks like the ring, 2D-torus, 3D-torus and hypercube), we derive tighter bounds and conclude that the diffusion algorithm is inefficient for lower dimensional meshes. \Lambd
Selection of Optimal Computing Platforms through the Suitability Measure a
Selection of spaceborne computing platforms requires balance among several competing factors. Traditional performance analysis techniques are illsuited for this purpose due to their overriding concern with runtime. The suitability measure is a new approach that quantifies the match between a computing platform and a program. It analyzes a program at the opcode and control flow levels, and compares this to a machine's capability to support the unique characteristics of the program. In this paper we develop the suitability measure and a series of program analysis methods. Experimental results confirm that machines that provide a better match to the program yield a higher suitability score. We prove that loops provide the only contribution to the suitability value, and also that the number of loop iterations is irrelevant, leading to the conclusion that a single pass through a loop is sufficient to derive a suitability value
- …