102 research outputs found

    Simulations and Algorithms on Reconfigurable Meshes With Pipelined Optical Buses.

    Get PDF
    Recently, many models using reconfigurable optically pipelined buses have been proposed in the literature. A system with an optically pipelined bus uses optical waveguides, with unidirectional propagation and predictable delays, instead of electrical buses to transfer information among processors. These two properties enable synchronized concurrent access to an optical bus in a pipelined fashion. Combined with the abilities of the bus structure to broadcast and multicast, this architecture suits many communication-intensive applications. We establish the equivalence of three such one-dimensional optical models, namely the LARPBS, LPB, and POB. This implies an automatic translation of algorithms (without loss of speed or efficiency) among these models. In particular, since the LPB is the same as an LARPBS without the ability to segment its buses, their equivalence establishes reconfigurable delays (rather than segmenting ability) as the key to the power of optically pipelined models. We also present simulations for a number of two-dimensional optical models and establish that they possess the same complexity, so that any of these models can simulate a step of one of the other models in constant time with a polynomial increase in size. Specifically, we determine the complexity of three two-dimensional optical models (the PR-Mesh, APPBS, and AROB) to be the same as the well known LR-Mesh and the cycle-free LR-Mesh. We develop algorithms for the LARPBS and PR-Mesh that are more efficient than existing algorithms in part by exploiting the pipelining, segmenting, and multicasting characteristics of these models. We also consider the implications of certain physical constraints placed on the system by restricting the distance over which two processors are able to communicate. All algorithms developed for these models assume that a healthy system is available. We present some fundamental algorithms that are able to tolerate up to N/2 faults on an N-processor LARPBS. We then extend these results to apply to other algorithms in the areas of image processing and matrix operations

    Scaling Simulations of Reconfigurable Meshes.

    Get PDF
    This dissertation deals with reconfigurable bus-based models, a new type of parallel machine that uses dynamically alterable connections between processors to allow efficient communication and to perform fast computations. We focus this work on the Reconfigurable Mesh (R-Mesh), one of the most widely studied reconfigurable models. We study the ability of the R-Mesh to adapt an algorithm instance of an arbitrary size to run on a given smaller model size without significant loss of efficiency. A scaling simulation achieves this adaptation, and the simulation overhead expresses the efficiency of the simulation. We construct a scaling simulation for the Fusing-Restricted Reconfigurable Mesh (FR-Mesh), an important restriction of the R-Mesh. The overhead of this simulation depends only on the simulating machine size and not on the simulated machine size. The results of this scaling simulation extend to a variety of concurrent write rules and also translate to an improved scaling simulation of the R-Mesh itself. We present a bus linearization procedure that transforms an arbitrary non-linear bus configuration of an R-Mesh into an equivalent acyclic linear bus configuration implementable on an Linear Reconfigurable Mesh (LR-Mesh), a weaker version of the R-Mesh. This procedure gives the algorithm designer the liberty of using buses of arbitrary shape, while automatically translating the algorithm to run on a simpler platform. We illustrate our bus linearization method through two important applications. The first leads to a faster scaling simulation of the R-Mesh. The second application adapts algorithms designed for R-Meshes to run on models with pipelined optical buses. We also present a simulation of a Directional Reconfigurable Mesh (DR-Mesh) on an LR-Mesh. This simulation has a much better efficiency compared to previous work. In addition to the LR-Mesh, this simulation also runs on models that use pipelined optical buses

    Efficient parallel computation on multiprocessors with optical interconnection networks

    Get PDF
    This dissertation studies optical interconnection networks, their architecture, address schemes, and computation and communication capabilities. We focus on a simple but powerful optical interconnection network model - the Linear Array with Reconfigurable pipelined Bus System (LARPBS). We extend the LARPBS model to a simplified higher dimensional LAPRBS and provide a set of basic computation operations. We then study the following two groups of parallel computation problems on both one dimensional LARPBS\u27s as well as multi-dimensional LARPBS\u27s: parallel comparison problems, including sorting, merging, and selection; Boolean matrix multiplication, transitive closure and their applications to connected component problems. We implement an optimal sorting algorithm on an n-processor LARPBS. With this optimal sorting algorithm at disposal, we study the sorting problem for higher dimensional LARPBS\u27s and obtain the following results: • An optimal basic Columnsort algorithm on a 2D LARPBS. • Two optimal two-way merge sort algorithms on a 2D LARPBS. • An optimal multi-way merge sorting algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 3D LARPBS. • An optimal 5-phase sorting algorithm on a 3D LARPBS. Results for selection problems are as follows: • A constant time maximum-finding algorithm on an LARPBS. • An optimal maximum-finding algorithm on an LARPBS. • An O((log log n)2) time parallel selection algorithm on an LARPBS. • An O(k(log log n)2) time parallel multi-selection algorithm on an LARPBS. While studying the computation and communication properties of the LARPBS model, we find Boolean matrix multiplication and its applications to the graph are another set of problem that can be solved efficiently on the LARPBS. Following is a list of results we have obtained in this area. • A constant time Boolean matrix multiplication algorithm. • An O(log n)-time transitive closure algorithm. • An O(log n)-time connected components algorithm. • An O(log n)-time strongly connected components algorithm. The results provided in this dissertation show the strong computation and communication power of optical interconnection networks

    Parallel progressive multiple sequence alignment on reconfigurable meshes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the most fundamental and challenging tasks in bio-informatics is to identify related sequences and their hidden biological significance. The most popular and proven best practice method to accomplish this task is aligning multiple sequences together. However, multiple sequence alignment is a computing extensive task. In addition, the advancement in DNA/RNA and Protein sequencing techniques has created a vast amount of sequences to be analyzed that exceeding the capability of traditional computing models. Therefore, an effective parallel multiple sequence alignment model capable of resolving these issues is in a great demand.</p> <p>Results</p> <p>We design <it>O</it>(1) run-time solutions for both local and global dynamic programming pair-wise alignment algorithms on reconfigurable mesh computing model. To align <it>m </it>sequences with max length <it>n</it>, we combining the parallel pair-wise dynamic programming solutions with newly designed parallel components. We successfully reduce the progressive multiple sequence alignment algorithm's run-time complexity from <it>O</it>(<it>m </it>× <it>n</it><sup>4</sup>) to <it>O</it>(<it>m</it>) using <it>O</it>(<it>m </it>× <it>n</it><sup>3</sup>) processing units for scoring schemes that use three distinct values for match/mismatch/gap-extension. The general solution to multiple sequence alignment algorithm takes <it>O</it>(<it>m </it>× <it>n</it><sup>4</sup>) processing units and completes in <it>O</it>(<it>m</it>) time.</p> <p>Conclusions</p> <p>To our knowledge, this is the first time the progressive multiple sequence alignment algorithm is completely parallelized with <it>O</it>(<it>m</it>) run-time. We also provide a new parallel algorithm for the Longest Common Subsequence (LCS) with <it>O</it>(1) run-time using <it>O</it>(<it>n</it><sup>3</sup>) processing units. This is a big improvement over the current best constant-time algorithm that uses <it>O</it>(<it>n</it><sup>4</sup>) processing units.</p

    Reconfigurable microarchitectures at the programmable logic interface

    Get PDF

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    On implementing dynamically reconfigurable architectures

    Get PDF
    Dynamically reconfigurable architectures have the ability to change their structure at each step of a computation. This dissertation studies various aspects of implementing dynamic reconfiguration, ranging from hardware building blocks and low-level architectures to modeling issues and high-level algorithm design. First we derive conditions under which classes of communication sets can be optimally scheduled on the circuit-switched tree (CST). Then we present a method to configure the CST to perform in constant time all communications scheduled for a step. This results in a constant time implementation of a step of a segmentable bus, a fundamental dynamically reconfigurable structure. We introduce a new bus delay measure (bends-cost) and define the bends-cost LR-Mesh; the LR-Mesh is a widely used reconfigurable model. Unlike the (idealized) LR-Mesh, which ignores bus delay, the bends-cost LR-Mesh uses the number of bends in a bus to estimate its delay. We present an implementation for which the bends-cost is an accurate estimate of the actual delay. We present algorithms to simulate various LR-Mesh configuration classes on the bends-cost LR-Mesh. For semimonotonic configurations, a Θ(N)*Θ(N) bends-cost LR-Mesh with bus delay at most D can simulate a step of the idealized N*N LR-Mesh in O((log N/(log D-log Δ))2) time (where Δ is the delay of an N-element segmentable bus), while employing about the same number of processors. For some special cases this time reduces to O(log N/(log D-log Δ)). If D=Nε, for an arbitrarily small constant ε \u3e 0, then the running times of bends-cost LR-Mesh algorithms are within a constant of their idealized counterparts. We also prove that with a polynomial blowup in the number of processors and D=Nε, the bends-cost LR-Mesh can simulate any step of an idealized LR-Mesh in constant time, thereby establishing that these models have the same power. We present an implementation (in VHDL) of the Enhanced Self Reconfigurable Gate Array (E-SRGA) architecture and perform a cost-benefit study for different dynamic reconfiguration features. This study shows our approach to be feasible

    Reconfiguration of field programmable logic in embedded systems

    Get PDF
    corecore