247 research outputs found
Expanded delta networks for very large parallel computers
In this paper we analyze a generalization of the traditional delta network, introduced by Patel [21], and dubbed Expanded Delta Network (EDN). These networks provide in general multiple paths that can be exploited to reduce contention in the network resulting in increased performance. The crossbar and traditional delta networks are limiting cases of this class of networks. However, the delta network does not provide the multiple paths that the more general expanded delta networks provide, and crossbars are to costly to use for large networks. The EDNs are analyzed with respect to their routing capabilities in the MIMD and SIMD models of computation.The concepts of capacity and clustering are also addressed. In massively parallel SIMD computers, it is the trend to put a larger number processors on a chip, but due to I/O constraints only a subset of the total number of processors may have access to the network. This is introduced as a Restricted Access Expanded Delta Network of which the MasPar MP-1 router network is an example
Recommended from our members
Self-routing lowest common ancestor networks
Multistage interconnection networks (MIN's) allow communication between terminals on opposing sides of a network. Lowest Common Ancestor Networks (LCAN's) [1] have switches capable of connecting bi-directional links in a permutation pattern that additionally permits communication between terminals on the same side. Self-routing LCAN's have interesting permutation routing capabilities and are highly partionable. This paper characterizes self-routing LCAN's and analyzes their permutation routing capabilities. It is shown that the routing network of the CM-5 is a particular instance of an LCAN
Reading list of selected PASM-related publications
Prepared for a chapter to be published in the forthcoming Encyclopedia of Parallel Computing by Springer Publishing Company. The Encyclopedia will contain a broad coverage of the field and will include entries on machine organization, programming, algorithms, and applications. The broad coverage, together with extensive pointers to the literature for in-depth study, is expected to make the Encyclopedia a useful reference tool in parallel computing
Reconfiguration for Fault Tolerance and Performance Analysis
Architecture reconfiguration, the ability of a system to alter the active interconnection among modules, has a history of different purposes and strategies. Its purposes develop from the relatively simple desire to formalize procedures that all processes have in common to reconfiguration for the improvement of fault-tolerance, to reconfiguration for performance enhancement, either through the simple maximizing of system use or by sophisticated notions of wedding topology to the specific needs of a given process.
Strategies range from straightforward redundancy by means of an identical backup system to intricate structures employing multistage interconnection networks. The present discussion surveys the more important contributions to developments in reconfigurable architecture. The strategy here is in a sense to approach the field from an historical perspective, with the goal of developing a more coherent theory of reconfiguration. First, the Turing and von Neumann machines are discussed from the perspective of system reconfiguration, and it is seen that this early important theoretical work contains little that anticipates reconfiguration. Then some early developments in reconfiguration are analyzed, including the work of Estrin and associates on the fixed plus variable restructurable computer system, the attempt to theorize about configurable computers by Miller and Cocke, and the work of Reddi and Feustel on their restructable computer system.
The discussion then focuses on the most sustained systems for fault tolerance and performance enhancement that have been proposed. An attempt will be made to define fault tolerance and to investigate some of the strategies used to achieve it. By investigating four different systems, the Tandern computer, the C.vmp system, the Extra Stage Cube, and the Gamma network, the move from dynamic redundancy to reconfiguration is observed. Then reconfiguration for performance enhancement is discussed. A survey of some proposals is attempted, then the discussion focuses on the most sustained systems that have been proposed: PASM, the DC architecture, the Star local network, and the NYU Ultracomputer. The discussion is organized around a comparison of control, scheduling, communication, and network topology.
Finally, comparisons are drawn between fault tolerance and performance enhancement, in order to clarify the notion of reconfiguration and to reveal the common ground of fault tolerance and performance enhancement as well as the areas in which they diverge. An attempt is made in the conclusion to derive from this survey and analysis some observations on the nature of reconfiguration, as well as some remarks on necessary further areas of research
Recommended from our members
Lowest common ancestor interconnection networks
Lowest Common Ancestor (LCA) networks are built using switches capable of connecting u + d inputs/outputs in a permutation pattern. For n source nodes and I stages of switches, n/d switches are used in stage l - n/d - u/d in stage l - 2, and in general , n-u^l-i-l/d^l-i switches in stage i. The resulting hierarchical structure possesses interesting connectivity and permutational properties. A full characterization of LCA networks is presented together with a permutation routing algorithm for a family of LCA networks. The algorithm uses the network itself to collect and disseminate information about the permutation. A schedule of O(dp log_d/u n) passes is obtained with a switch set-up cost factor of O(log_d/u n) (p is the minimum number of passes that an algorithm with global knowledge schedules)
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems
Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems
Probabilistic structural mechanics research for parallel processing computers
Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical
Multistage interconnection networks : improved routing algorithms and fault tolerance
Multistage interconnection networks for use by multiprocessor systems are optimal in terms of the number of switching element, but the routing algorithms used to set up these networks are suboptimal in terms of time. The network set-up time and reliability are the major factors to affect the performance of multistage interconnection networks. This work improves routing on Benes and Clos networks as well as the fault tolerant capability. The permutation representation is examined as well as the Clos and Benes networks. A modified edge coloring algorithm is applied to the regular bipartite multigraph which represents a Clos network. The looping and parallel looping algorithms are examined and a modified Tree-Connected Computer is adopted to execute a bidirectional parallel looping algorithm for Benes networks. A new fault tolerant Clos network is presented
A Frame Work for Parallel String Matching- A Computational Approach with Omega Model
Now a day2019;s parallel string matching problem is attracted by so many researchers because of the importance in information retrieval systems. While it is very easily stated and many of the simple algorithms perform very well in practice, numerous works have been published on the subject and research is still very active. In this paper we propose a omega parallel computing model for parallel string matching. Experimental results show that, on a multi-processor system, the omega model implementation of the proposed parallel string matching algorithm can reduce string matching time by more than 40%
Mppsocgen: A framework for automatic generation of mppsoc architecture
Automatic code generation is a standard method in software engineering since
it improves the code consistency and reduces the overall development time. In
this context, this paper presents a design flow for automatic VHDL code
generation of mppSoC (massively parallel processing System-on-Chip)
configuration. Indeed, depending on the application requirements, a framework
of Netbeans Platform Software Tool named MppSoCGEN was developed in order to
accelerate the design process of complex mppSoC. Starting from an architecture
parameters design, VHDL code will be automatically generated using parsing
method. Configuration rules are proposed to have a correct and valid VHDL
syntax configuration. Finally, an automatic generation of Processor Elements
and network topologies models of mppSoC architecture will be done for Stratix
II device family. Our framework improves its flexibility on Netbeans 5.5
version and centrino duo Core 2GHz with 22 Kbytes and 3 seconds average
runtime. Experimental results for reduction algorithm validate our MppSoCGEN
design flow and demonstrate the efficiency of generated architectures.Comment: 16 pages; International Journal of Computer Science & Information
Technology (IJCSIT) Vol 4, No 2, April 201
- …