181 research outputs found

    Routing with locality in partitioned-bus meshes

    Get PDF
    We show that adding partitioned-buses (as opposed to long buses that span an entire row or column) to ordinary meshes can reduce the routing time by approximately one-third for permutation routing with locality. A matching time lower bound is also proved. The result can be generalized to multi-packet routing.published_or_final_versio

    Mesh Connected Computers With Multiple Fixed Buses: Packet Routing, Sorting and Selection

    Get PDF
    Mesh connected computers have become attractive models of computing because of their varied special features. In this paper we consider two variations of the mesh model: 1) a mesh with fixed buses, and 2) a mesh with reconfigurable buses. Both these models have been the subject matter of extensive previous research. We solve numerous important problems related to packet routing, sorting, and selection on these models. In particular, we provide lower bounds and very nearly matching upper bounds for the following problems on both these models: 1) Routing on a linear array; and 2) k-k routing, k-k sorting, and cut through routing on a 2D mesh for any k ≥ 12. We provide an improved algorithm for 1-1 routing and a matching sorting algorithm. In addition we present greedy algorithms for 1-1 routing, k-k routing, cut through routing, and k-k sorting that are better on average and supply matching lower bounds. We also show that sorting can be performed in logarithmic time on a mesh with fixed buses. As a consequence we present an optimal randomized selection algorithm. In addition we provide a selection algorithm for the mesh with reconfigurable buses whose time bound is significantly better than the existing ones. Our algorithms have considerably better time bounds than many existing best known algorithms

    Randomized Routing and Sorting on the Reconfigurable Mesh

    Get PDF
    In this paper we demonstrate the power of reconfiguration by presenting efficient randomized algorithms for both packet routing and sorting on a reconfigurable mesh connected computer (referred to simply as the mesh from hereon). The run times of these algorithms are better than the best achievable time bounds on a conventional mesh. In particular, we show that permutation routing problem can be solved on a linear array of size n in 3/4n steps, whereas n-1 is the best possible run time without reconfiguration. We also show that permutation routing on an n x n reconfigurable mesh can be done in time n + o(n)using a randomized algorithm or in time 1.25n + o(n) deterministically. In contrast, 2n-2 is the diameter of a conventional mesh and hence routing and sorting will need at least 2n-2 steps on a conventional mesh. In addition we show that the problem of sorting can be solved in time n+ o(n). All these time bounds hold with high probability. The bisection lower bound for both sorting and routing on the mesh is n/2, and hence our algorithms have nearly optimal time bounds

    A Comparison of Meshes With Static Buses and Unidirectional Wrap-Arounds

    Get PDF
    We investigate the relative computational powers of a mesh with static buses and a mesh with unidirectional wrap-mounds. A mesh with unidirectional wraparounds is a torus with the restriction that any wraparoundlink of the architecture can only transmit data in one of the two directions at any clock tick. We show that the problem of packet routing can be solved as efficiently on a linear array with unidirectional wrap-around link as on a linear array with a broadcast bus. We also present a routing algorithm for a twcdimensional torus with unidirectional wraparound links whose run time is close to that of the best known algorithm for routing on a mesh with broadcast buses in each dimension. In addition, we show that on a mesh with broadcast buses, sorting can be done in time that is essentially the same as the time needed for packet routing

    Turbo NOC: a framework for the design of Network On Chip based turbo decoder architectures

    Get PDF
    This work proposes a general framework for the design and simulation of network on chip based turbo decoder architectures. Several parameters in the design space are investigated, namely the network topology, the parallelism degree, the rate at which messages are sent by processing nodes over the network and the routing strategy. The main results of this analysis are: i) the most suited topologies to achieve high throughput with a limited complexity overhead are generalized de-Bruijn and generalized Kautz topologies; ii) depending on the throughput requirements different parallelism degrees, message injection rates and routing algorithms can be used to minimize the network area overhead.Comment: submitted to IEEE Trans. on Circuits and Systems I (submission date 27 may 2009

    Efficient parallel processing with optical interconnections

    Get PDF
    With the advances in VLSI technology, it is now possible to build chips which can each contain thousands of processors. The efficiency of such chips in executing parallel algorithms heavily depends on the interconnection topology of the processors. It is not possible to build a fully interconnected network of processors with constant fan-in/fan-out using electrical interconnections. Free space optics is a remedy to this limitation. Qualities exclusive to the optical medium are its ability to be directed for propagation in free space and the property that optical channels can cross in space without any interference. In this thesis, we present an electro-optical interconnected architecture named Optical Reconfigurable Mesh (ORM). It is based on an existing optical model of computation. There are two layers in the architecture. The processing layer is a reconfigurable mesh and the deflecting layer contains optical devices to deflect light beams. ORM provides three types of communication mechanisms. The first is for arbitrary planar connections among sets of locally connected processors using the reconfigurable mesh. The second is for arbitrary connections among N of the processors using the electrical buses on the processing layer and N2 fixed passive deflecting units on the deflection layer. The third is for arbitrary connections among any of the N2 processors using the N2 mechanically reconfigurable deflectors in the deflection layer. The third type of communication mechanisms is significantly slower than the other two. Therefore, it is desirable to avoid reconfiguring this type of communication during the execution of the algorithms. Instead, the optical reconfiguration can be done before the execution of each algorithm begins. Determining a right configuration that would be suitable for the entire configuration of a task execution is studied in this thesis. The basic data movements for each of the mechanisms are studied. Finally, to show the power of ORM, we use all three types of communication mechanisms in the first O(logN) time algorithm for finding the convex hulls of all figures in an N x N binary image presented in this thesis

    Simulations and Algorithms on Reconfigurable Meshes With Pipelined Optical Buses.

    Get PDF
    Recently, many models using reconfigurable optically pipelined buses have been proposed in the literature. A system with an optically pipelined bus uses optical waveguides, with unidirectional propagation and predictable delays, instead of electrical buses to transfer information among processors. These two properties enable synchronized concurrent access to an optical bus in a pipelined fashion. Combined with the abilities of the bus structure to broadcast and multicast, this architecture suits many communication-intensive applications. We establish the equivalence of three such one-dimensional optical models, namely the LARPBS, LPB, and POB. This implies an automatic translation of algorithms (without loss of speed or efficiency) among these models. In particular, since the LPB is the same as an LARPBS without the ability to segment its buses, their equivalence establishes reconfigurable delays (rather than segmenting ability) as the key to the power of optically pipelined models. We also present simulations for a number of two-dimensional optical models and establish that they possess the same complexity, so that any of these models can simulate a step of one of the other models in constant time with a polynomial increase in size. Specifically, we determine the complexity of three two-dimensional optical models (the PR-Mesh, APPBS, and AROB) to be the same as the well known LR-Mesh and the cycle-free LR-Mesh. We develop algorithms for the LARPBS and PR-Mesh that are more efficient than existing algorithms in part by exploiting the pipelining, segmenting, and multicasting characteristics of these models. We also consider the implications of certain physical constraints placed on the system by restricting the distance over which two processors are able to communicate. All algorithms developed for these models assume that a healthy system is available. We present some fundamental algorithms that are able to tolerate up to N/2 faults on an N-processor LARPBS. We then extend these results to apply to other algorithms in the areas of image processing and matrix operations

    Flexible LDPC Decoder Architectures

    Get PDF
    Flexible channel decoding is getting significance with the increase in number of wireless standards and modes within a standard. A flexible channel decoder is a solution providing interstandard and intrastandard support without change in hardware. However, the design of efficient implementation of flexible low-density parity-check (LDPC) code decoders satisfying area, speed, and power constraints is a challenging task and still requires considerable research effort. This paper provides an overview of state-of-the-art in the design of flexible LDPC decoders. The published solutions are evaluated at two levels of architectural design: the processing element (PE) and the interconnection structure. A qualitative and quantitative analysis of different design choices is carried out, and comparison is provided in terms of achieved flexibility, throughput, decoding efficiency, and area (power) consumption
    corecore