187 research outputs found

    A constant time parallel algorithm for the triangularization of a sparse matrix using CD-PARBS

    Get PDF
    An algorithm for the triangularization of a matrix whose graph is a directed acyclic graph, popularly known as dag, is presented. One of the algorithms for obtaining this special form has been given by Sargent and Westerberg. Their approach is practically good but sequential in nature and cannot be parallelised easily. In this work we present a parallel algorithm which is based on the observation that, if we find the transitive closure matrix of a directed acyclic graph, count the number of entries in each row, sort them in the ascending order of their values and rank them accordingly, we get a lower triangular matrix. We show that all these operations can be done using 3-d CD- PARBS(Complete Directed PARBS) in constant time. The same approach can be used for the block cases, producing the same relabelling as produced by Tarjan’s algorithm, in constant time. To the best of our knowledge, it is the first approach to solve such problems using directed PARBS

    A fast parallel algorithm for special linear systems of equations using processor arrays with reconfigurable bus systems

    Get PDF
    A parallel algorithm using Processor Arrays with Reconfigurable Bus Systems has been designed to solve dense Symmetric Positive Definite (SPD) systems of equations Ax = b. The key content of this report is the parallelisation of the algorithm by Delosme & Ipson [8]. In order to design a parallel algorithm for PARBS, many procedures involved in [8] are handled in a slightly different way. The parallel time and processor’s complexity of each step of the algorithm is calculated. The parallel time complexity is O(n) using 2n × 2n × 5n number of Processing Elements

    Fast Ant Colony Optimization on Runtime Reconfigurable Processor Arrays

    Get PDF
    Ant Colony Optimization (ACO) is a metaheuristic used to solve combinatorial optimization problems. As with other metaheuristics, like evolutionary methods, ACO algorithms often show good optimization behavior but are slow when compared to classical heuristics. Hence, there is a need to find fast implementations for ACO algorithms. In order to allow a fast parallel implementation, we propose several changes to a standard form of ACO algorithms. The main new features are the non-generational approach and the use of a threshold based decision function for the ants. We show that the new algorithm has a good optimization behavior and also allows a fast implementation on reconfigurable processor arrays. This is the first implementation of the ACO approach on a reconfigurable architecture. The running time of the algorithm is quasi-linear in the problem size n and the number of ants on a reconfigurable mesh with n2 processors, each provided with only a constant number of memory words

    Mesh Connected Computers With Multiple Fixed Buses: Packet Routing, Sorting and Selection

    Get PDF
    Mesh connected computers have become attractive models of computing because of their varied special features. In this paper we consider two variations of the mesh model: 1) a mesh with fixed buses, and 2) a mesh with reconfigurable buses. Both these models have been the subject matter of extensive previous research. We solve numerous important problems related to packet routing, sorting, and selection on these models. In particular, we provide lower bounds and very nearly matching upper bounds for the following problems on both these models: 1) Routing on a linear array; and 2) k-k routing, k-k sorting, and cut through routing on a 2D mesh for any k ≥ 12. We provide an improved algorithm for 1-1 routing and a matching sorting algorithm. In addition we present greedy algorithms for 1-1 routing, k-k routing, cut through routing, and k-k sorting that are better on average and supply matching lower bounds. We also show that sorting can be performed in logarithmic time on a mesh with fixed buses. As a consequence we present an optimal randomized selection algorithm. In addition we provide a selection algorithm for the mesh with reconfigurable buses whose time bound is significantly better than the existing ones. Our algorithms have considerably better time bounds than many existing best known algorithms

    Scaling Simulations of Reconfigurable Meshes.

    Get PDF
    This dissertation deals with reconfigurable bus-based models, a new type of parallel machine that uses dynamically alterable connections between processors to allow efficient communication and to perform fast computations. We focus this work on the Reconfigurable Mesh (R-Mesh), one of the most widely studied reconfigurable models. We study the ability of the R-Mesh to adapt an algorithm instance of an arbitrary size to run on a given smaller model size without significant loss of efficiency. A scaling simulation achieves this adaptation, and the simulation overhead expresses the efficiency of the simulation. We construct a scaling simulation for the Fusing-Restricted Reconfigurable Mesh (FR-Mesh), an important restriction of the R-Mesh. The overhead of this simulation depends only on the simulating machine size and not on the simulated machine size. The results of this scaling simulation extend to a variety of concurrent write rules and also translate to an improved scaling simulation of the R-Mesh itself. We present a bus linearization procedure that transforms an arbitrary non-linear bus configuration of an R-Mesh into an equivalent acyclic linear bus configuration implementable on an Linear Reconfigurable Mesh (LR-Mesh), a weaker version of the R-Mesh. This procedure gives the algorithm designer the liberty of using buses of arbitrary shape, while automatically translating the algorithm to run on a simpler platform. We illustrate our bus linearization method through two important applications. The first leads to a faster scaling simulation of the R-Mesh. The second application adapts algorithms designed for R-Meshes to run on models with pipelined optical buses. We also present a simulation of a Directional Reconfigurable Mesh (DR-Mesh) on an LR-Mesh. This simulation has a much better efficiency compared to previous work. In addition to the LR-Mesh, this simulation also runs on models that use pipelined optical buses

    GRAPE-5: A Special-Purpose Computer for N-body Simulation

    Get PDF
    We have developed a special-purpose computer for gravitational many-body simulations, GRAPE-5. GRAPE-5 is the successor of GRAPE-3. Both consist of eight custom pipeline chips (G5 chip and GRAPE chip). The difference between GRAPE-5 and GRAPE-3 are: (1) The G5 chip contains two pipelines operating at 80 MHz, while the GRAPE chip had one at 20 MHz. Thus, the calculation speed of the G5 chip and that of GRAPE-5 board are 8 times faster than that of GRAPE chip and GRAPE-3 board. (2) The GRAPE-5 board adopted PCI bus as the interface to the host computer instead of VME of GRAPE-3, resulting in the communication speed one order of magnitude faster. (3) In addition to the pure 1/r potential, the G5 chip can calculate forces with arbitrary cutoff functions, so that it can be applied to Ewald or P^3M methods. (4) The pairwise force calculated on GRAPE-5 is about 10 times more accurate than that on GRAPE-3. On one GRAPE-5 board, one timestep of 128k-body simulation with direct summation algorithm takes 14 seconds. With Barnes-Hut tree algorithm (theta = 0.75), one timestep of 10^6-body simulation can be done in 16 seconds.Comment: 19 pages, 24 Postscript figures, 3 tables, Latex, submitted to Publications of the Astronomical Society of Japa

    Field Programmable Reconfigurable Mesh (FPRM)

    Get PDF
    Many application areas demand increasing amounts of processing capabilities. FPGAs have been widely used for improving this performance. FPRM (Field Programmable Reconfigurable Mesh) is a technique we propose to improve FPGA performance. A Reconfigurable Mesh (RM) consists of a grid of Processing Elements that use dynamic reconfigurations to create varying bus segments between them. The RM can thus perform computations such as Sorting or Counting in a constant number of steps. It has long been speculated that the RM’s dynamic reconfigurations should replace the FPGA’s static reconfigurations. We show that the RM is capable of not only speeding up specific computations such as sorting or summing, but also of speeding up the evaluation of Boolean circuits (BCs), which is the main purpose of the FPGA. Our proposed RM algorithm can evaluate BCs without causing size blowup. Furthermore, tri-state switching elements can be used instead of PEs in a grid

    Distributed and Interactive Simulations Operating at Large Scale for Transcontinental Experimentation

    Get PDF
    This paper addresses the use of emerging technologies to respond to the increasing needs for larger and more sophisticated agent-based simulations of urban areas. The U.S. Joint Forces Command has found it useful to seek out and apply technologies largely developed for academic research in the physical sciences. The use of these techniques in transcontinentally distributed, interactive experimentation has been shown to be effective and stable and the analyses of the data find parallels in the behavioral sciences. The authors relate their decade and a half experience in implementing high performance computing hardware, software and user inter-face architectures. These have enabled heretofore unachievable results. They focus on three advances: the use of general purpose graphics processing units as computing accelerators, the efficiencies derived from implementing interest managed routers in distributed systems, and the benefits of effective data management for the voluminous information

    Design and process/measurement for immersed element control in a reconfigurable vertically falling soap film

    Get PDF
    Thesis (S.B.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2007.Includes bibliographical references (p. 24-25).Reinforcement learning has proven successful at harnessing the passive dynamics of underactuated systems to achieve least energy solutions. However, coupled fluid-structural models are too computationally intensive for in-the-loop control in viscous flow regimes. My vertically falling soap film will provide a reconfigurable experimental environment for machine learning controllers. The real-time position and velocity data will be collected with a High Speed Video system, illuminated by a Low Pressure Sodium Lamp. Approximating lines of interference within the soap film to known pressure variations, controllers will shape downstream flow to desired conditions. Though accurate measurement still eludes those without Laser Doppler Velocimetry, order of magnitude Reynolds numbers can be estimated to describe the regime of controller inquiry.by John Glowa.S.B
    • …
    corecore