206 research outputs found

    Routing with locality in partitioned-bus meshes

    Get PDF
    We show that adding partitioned-buses (as opposed to long buses that span an entire row or column) to ordinary meshes can reduce the routing time by approximately one-third for permutation routing with locality. A matching time lower bound is also proved. The result can be generalized to multi-packet routing.published_or_final_versio

    A Comparison of Meshes With Static Buses and Unidirectional Wrap-Arounds

    Get PDF
    We investigate the relative computational powers of a mesh with static buses and a mesh with unidirectional wrap-mounds. A mesh with unidirectional wraparounds is a torus with the restriction that any wraparoundlink of the architecture can only transmit data in one of the two directions at any clock tick. We show that the problem of packet routing can be solved as efficiently on a linear array with unidirectional wrap-around link as on a linear array with a broadcast bus. We also present a routing algorithm for a twcdimensional torus with unidirectional wraparound links whose run time is close to that of the best known algorithm for routing on a mesh with broadcast buses in each dimension. In addition, we show that on a mesh with broadcast buses, sorting can be done in time that is essentially the same as the time needed for packet routing

    Mesh Connected Computers With Multiple Fixed Buses: Packet Routing, Sorting and Selection

    Get PDF
    Mesh connected computers have become attractive models of computing because of their varied special features. In this paper we consider two variations of the mesh model: 1) a mesh with fixed buses, and 2) a mesh with reconfigurable buses. Both these models have been the subject matter of extensive previous research. We solve numerous important problems related to packet routing, sorting, and selection on these models. In particular, we provide lower bounds and very nearly matching upper bounds for the following problems on both these models: 1) Routing on a linear array; and 2) k-k routing, k-k sorting, and cut through routing on a 2D mesh for any k ≥ 12. We provide an improved algorithm for 1-1 routing and a matching sorting algorithm. In addition we present greedy algorithms for 1-1 routing, k-k routing, cut through routing, and k-k sorting that are better on average and supply matching lower bounds. We also show that sorting can be performed in logarithmic time on a mesh with fixed buses. As a consequence we present an optimal randomized selection algorithm. In addition we provide a selection algorithm for the mesh with reconfigurable buses whose time bound is significantly better than the existing ones. Our algorithms have considerably better time bounds than many existing best known algorithms

    Randomized Routing and Sorting on the Reconfigurable Mesh

    Get PDF
    In this paper we demonstrate the power of reconfiguration by presenting efficient randomized algorithms for both packet routing and sorting on a reconfigurable mesh connected computer (referred to simply as the mesh from hereon). The run times of these algorithms are better than the best achievable time bounds on a conventional mesh. In particular, we show that permutation routing problem can be solved on a linear array of size n in 3/4n steps, whereas n-1 is the best possible run time without reconfiguration. We also show that permutation routing on an n x n reconfigurable mesh can be done in time n + o(n)using a randomized algorithm or in time 1.25n + o(n) deterministically. In contrast, 2n-2 is the diameter of a conventional mesh and hence routing and sorting will need at least 2n-2 steps on a conventional mesh. In addition we show that the problem of sorting can be solved in time n+ o(n). All these time bounds hold with high probability. The bisection lower bound for both sorting and routing on the mesh is n/2, and hence our algorithms have nearly optimal time bounds

    Simulations and Algorithms on Reconfigurable Meshes With Pipelined Optical Buses.

    Get PDF
    Recently, many models using reconfigurable optically pipelined buses have been proposed in the literature. A system with an optically pipelined bus uses optical waveguides, with unidirectional propagation and predictable delays, instead of electrical buses to transfer information among processors. These two properties enable synchronized concurrent access to an optical bus in a pipelined fashion. Combined with the abilities of the bus structure to broadcast and multicast, this architecture suits many communication-intensive applications. We establish the equivalence of three such one-dimensional optical models, namely the LARPBS, LPB, and POB. This implies an automatic translation of algorithms (without loss of speed or efficiency) among these models. In particular, since the LPB is the same as an LARPBS without the ability to segment its buses, their equivalence establishes reconfigurable delays (rather than segmenting ability) as the key to the power of optically pipelined models. We also present simulations for a number of two-dimensional optical models and establish that they possess the same complexity, so that any of these models can simulate a step of one of the other models in constant time with a polynomial increase in size. Specifically, we determine the complexity of three two-dimensional optical models (the PR-Mesh, APPBS, and AROB) to be the same as the well known LR-Mesh and the cycle-free LR-Mesh. We develop algorithms for the LARPBS and PR-Mesh that are more efficient than existing algorithms in part by exploiting the pipelining, segmenting, and multicasting characteristics of these models. We also consider the implications of certain physical constraints placed on the system by restricting the distance over which two processors are able to communicate. All algorithms developed for these models assume that a healthy system is available. We present some fundamental algorithms that are able to tolerate up to N/2 faults on an N-processor LARPBS. We then extend these results to apply to other algorithms in the areas of image processing and matrix operations

    Design and Analysis of Optical Interconnection Networks for Parallel Computation.

    Get PDF
    In this doctoral research, we propose several novel protocols and topologies for the interconnection of massively parallel processors. These new technologies achieve considerable improvements in system performance and structure simplicity. Currently, synchronous protocols are used in optical TDM buses. The major disadvantage of a synchronous protocol is the waste of packet slots. To offset this inherent drawback of synchronous TDM, a pipelined asynchronous TDM optical bus is proposed. The simulation results show that the performance of the proposed bus is significantly better than that of known pipelined synchronous TDM optical buses. Practically, the computation power of the plain TDM protocol is limited. Various extensions must be added to the system. In this research, a new pipelined optical TDM bus for implementing a linear array parallel computer architecture is proposed. The switches on the receiving segment of the bus can be dynamically controlled, which make the system highly reconfigurable. To build large and scalable systems, we need new network architectures that are suitable for optical interconnections. A new kind of reconfigurable bus called segmented bus is introduced to achieve reduced structure simplicity and increased concurrency. We show that parallel architectures based on segmented buses are versatile by showing that it can simulate parallel communication patterns supported by a wide variety of networks with small slowdown factors. New kinds of interconnection networks, the hypernetworks, have been proposed recently. Compared with point-to-point networks, they allow for increased resource-sharing and communication bandwidth utilization, and they are especially suitable for optical interconnects. One way to derive a hypernetwork is by finding the dual of a point-to-point network. Hypercube Q\sb{n}, where n is the dimension, is a very popular point-to-point network. It is interesting to construct hypernetworks from the dual Q\sbsp{n}{*} of hypercube of Q\sb{n}. In this research, the properties of Q\sbsp{n}{*} are investigated and a set of fundamental data communication algorithms for Q\sbsp{n}{*} are presented. The results indicate that the Q\sbsp{n}{*} hypernetwork is a useful and promising interconnection structure for high-performance parallel and distributed computing systems

    An improved generalization of mesh-connected computers with multiple buses

    Get PDF
    ©2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.Mesh-connected computers (MCCs) are a class of important parallel architectures due to their simple and regular interconnections. However, their performances are restricted by their large diameters. Various augmenting mechanisms have been proposed to enhance the communication efficiency of MCCs. One major approach is to add nonconfigurable buses for improved broadcasting. A typical example is the mesh-connected computer with multiple buses (MMB). We propose a new class of generalized MMBs, the improved generalized MMBs (IMMBs). We compare IMMBs with MMBs and a class of previously proposed generalized MMBs (GMMBs). We show the power of IMMBs by considering semigroup and prefix computations. Specifically, as our main result we show that for any constant 0<&epsiv;<1, one can construct an N½×N½ square IMMB using which semigroup and prefix computations on N operands can be carried out in O(N&epsiv;) time, while maintaining O(1) broadcasting time. Compared with the previous best complexities O(N&frac18;) and O(N&frac116;) achieved on a rectangular MMB and GMMB, respectively, for the same computations, our results show that IMMBs are more powerful than MMBs and GMMBsYi Pen; Zheng, S.Q.; Keqin Li; Hong She

    On implementing dynamically reconfigurable architectures

    Get PDF
    Dynamically reconfigurable architectures have the ability to change their structure at each step of a computation. This dissertation studies various aspects of implementing dynamic reconfiguration, ranging from hardware building blocks and low-level architectures to modeling issues and high-level algorithm design. First we derive conditions under which classes of communication sets can be optimally scheduled on the circuit-switched tree (CST). Then we present a method to configure the CST to perform in constant time all communications scheduled for a step. This results in a constant time implementation of a step of a segmentable bus, a fundamental dynamically reconfigurable structure. We introduce a new bus delay measure (bends-cost) and define the bends-cost LR-Mesh; the LR-Mesh is a widely used reconfigurable model. Unlike the (idealized) LR-Mesh, which ignores bus delay, the bends-cost LR-Mesh uses the number of bends in a bus to estimate its delay. We present an implementation for which the bends-cost is an accurate estimate of the actual delay. We present algorithms to simulate various LR-Mesh configuration classes on the bends-cost LR-Mesh. For semimonotonic configurations, a Θ(N)*Θ(N) bends-cost LR-Mesh with bus delay at most D can simulate a step of the idealized N*N LR-Mesh in O((log N/(log D-log Δ))2) time (where Δ is the delay of an N-element segmentable bus), while employing about the same number of processors. For some special cases this time reduces to O(log N/(log D-log Δ)). If D=Nε, for an arbitrarily small constant ε \u3e 0, then the running times of bends-cost LR-Mesh algorithms are within a constant of their idealized counterparts. We also prove that with a polynomial blowup in the number of processors and D=Nε, the bends-cost LR-Mesh can simulate any step of an idealized LR-Mesh in constant time, thereby establishing that these models have the same power. We present an implementation (in VHDL) of the Enhanced Self Reconfigurable Gate Array (E-SRGA) architecture and perform a cost-benefit study for different dynamic reconfiguration features. This study shows our approach to be feasible
    • …
    corecore