187 research outputs found
A constant time parallel algorithm for the triangularization of a sparse matrix using CD-PARBS
An algorithm for the triangularization of a matrix whose graph is a directed
acyclic graph, popularly known as dag, is presented. One of the algorithms for
obtaining this special form has been given by Sargent and Westerberg. Their
approach is practically good but sequential in nature and cannot be
parallelised easily. In this work we present a parallel algorithm which is
based on the observation that, if we find the transitive closure matrix of a
directed acyclic graph, count the number of entries in each row, sort them in
the ascending order of their values and rank them accordingly, we get a lower
triangular matrix. We show that all these operations can be done using 3-d CD-
PARBS(Complete Directed PARBS) in constant time. The same approach can be used
for the block cases, producing the same relabelling as produced by Tarjan’s
algorithm, in constant time. To the best of our knowledge, it is the first
approach to solve such problems using directed PARBS
A fast parallel algorithm for special linear systems of equations using processor arrays with reconfigurable bus systems
A parallel algorithm using Processor Arrays with Reconfigurable Bus Systems
has been designed to solve dense Symmetric Positive Definite (SPD) systems of
equations Ax = b. The key content of this report is the parallelisation of the
algorithm by Delosme & Ipson [8]. In order to design a parallel algorithm for
PARBS, many procedures involved in [8] are handled in a slightly different
way. The parallel time and processor’s complexity of each step of the
algorithm is calculated. The parallel time complexity is O(n) using 2n Ă— 2n Ă—
5n number of Processing Elements
Fast Ant Colony Optimization on Runtime Reconfigurable Processor Arrays
Ant Colony Optimization (ACO) is a metaheuristic used to solve combinatorial optimization problems. As with other metaheuristics, like evolutionary methods, ACO algorithms often show good optimization behavior but are slow when compared to classical heuristics. Hence, there is a need to find fast implementations for ACO algorithms. In order to allow a fast parallel implementation, we propose several changes to a standard form of ACO algorithms. The main new features are the non-generational approach and the use of a threshold based decision function for the ants. We show that the new algorithm has a good optimization behavior and also allows a fast implementation on reconfigurable processor arrays. This is the first implementation of the ACO approach on a reconfigurable architecture. The running time of the algorithm is quasi-linear in the problem size n and the number of ants on a reconfigurable mesh with n2 processors, each provided with only a constant number of memory words
Mesh Connected Computers With Multiple Fixed Buses: Packet Routing, Sorting and Selection
Mesh connected computers have become attractive models of computing because of their varied special features. In this paper we consider two variations of the mesh model: 1) a mesh with fixed buses, and 2) a mesh with reconfigurable buses. Both these models have been the subject matter of extensive previous research. We solve numerous important problems related to packet routing, sorting, and selection on these models. In particular, we provide lower bounds and very nearly matching upper bounds for the following problems on both these models: 1) Routing on a linear array; and 2) k-k routing, k-k sorting, and cut through routing on a 2D mesh for any k ≥ 12. We provide an improved algorithm for 1-1 routing and a matching sorting algorithm. In addition we present greedy algorithms for 1-1 routing, k-k routing, cut through routing, and k-k sorting that are better on average and supply matching lower bounds. We also show that sorting can be performed in logarithmic time on a mesh with fixed buses. As a consequence we present an optimal randomized selection algorithm. In addition we provide a selection algorithm for the mesh with reconfigurable buses whose time bound is significantly better than the existing ones. Our algorithms have considerably better time bounds than many existing best known algorithms
Scaling Simulations of Reconfigurable Meshes.
This dissertation deals with reconfigurable bus-based models, a new type of parallel machine that uses dynamically alterable connections between processors to allow efficient communication and to perform fast computations. We focus this work on the Reconfigurable Mesh (R-Mesh), one of the most widely studied reconfigurable models. We study the ability of the R-Mesh to adapt an algorithm instance of an arbitrary size to run on a given smaller model size without significant loss of efficiency. A scaling simulation achieves this adaptation, and the simulation overhead expresses the efficiency of the simulation. We construct a scaling simulation for the Fusing-Restricted Reconfigurable Mesh (FR-Mesh), an important restriction of the R-Mesh. The overhead of this simulation depends only on the simulating machine size and not on the simulated machine size. The results of this scaling simulation extend to a variety of concurrent write rules and also translate to an improved scaling simulation of the R-Mesh itself. We present a bus linearization procedure that transforms an arbitrary non-linear bus configuration of an R-Mesh into an equivalent acyclic linear bus configuration implementable on an Linear Reconfigurable Mesh (LR-Mesh), a weaker version of the R-Mesh. This procedure gives the algorithm designer the liberty of using buses of arbitrary shape, while automatically translating the algorithm to run on a simpler platform. We illustrate our bus linearization method through two important applications. The first leads to a faster scaling simulation of the R-Mesh. The second application adapts algorithms designed for R-Meshes to run on models with pipelined optical buses. We also present a simulation of a Directional Reconfigurable Mesh (DR-Mesh) on an LR-Mesh. This simulation has a much better efficiency compared to previous work. In addition to the LR-Mesh, this simulation also runs on models that use pipelined optical buses
GRAPE-5: A Special-Purpose Computer for N-body Simulation
We have developed a special-purpose computer for gravitational many-body
simulations, GRAPE-5. GRAPE-5 is the successor of GRAPE-3. Both consist of
eight custom pipeline chips (G5 chip and GRAPE chip). The difference between
GRAPE-5 and GRAPE-3 are: (1) The G5 chip contains two pipelines operating at 80
MHz, while the GRAPE chip had one at 20 MHz. Thus, the calculation speed of the
G5 chip and that of GRAPE-5 board are 8 times faster than that of GRAPE chip
and GRAPE-3 board. (2) The GRAPE-5 board adopted PCI bus as the interface to
the host computer instead of VME of GRAPE-3, resulting in the communication
speed one order of magnitude faster. (3) In addition to the pure 1/r potential,
the G5 chip can calculate forces with arbitrary cutoff functions, so that it
can be applied to Ewald or P^3M methods. (4) The pairwise force calculated on
GRAPE-5 is about 10 times more accurate than that on GRAPE-3. On one GRAPE-5
board, one timestep of 128k-body simulation with direct summation algorithm
takes 14 seconds. With Barnes-Hut tree algorithm (theta = 0.75), one timestep
of 10^6-body simulation can be done in 16 seconds.Comment: 19 pages, 24 Postscript figures, 3 tables, Latex, submitted to
Publications of the Astronomical Society of Japa
Field Programmable Reconfigurable Mesh (FPRM)
Many application areas demand increasing amounts of processing capabilities. FPGAs have been widely used for improving this performance. FPRM (Field Programmable Reconfigurable Mesh) is a technique we propose to improve FPGA performance. A Reconfigurable Mesh (RM) consists of a grid of Processing Elements that use dynamic reconfigurations to create varying bus segments between them. The RM can thus perform computations such as Sorting or Counting in a constant number of steps. It has long been speculated that the RM’s dynamic reconfigurations should replace the FPGA’s static reconfigurations. We show that the RM is capable of not only speeding up specific computations such as sorting or summing, but also of speeding up the evaluation of Boolean circuits (BCs), which is the main purpose of the FPGA. Our proposed RM algorithm can evaluate BCs without causing size blowup. Furthermore, tri-state switching elements can be used instead of PEs in a grid
Distributed and Interactive Simulations Operating at Large Scale for Transcontinental Experimentation
This paper addresses the use of emerging technologies to respond to the increasing needs for larger and more sophisticated agent-based simulations of urban areas. The U.S. Joint Forces Command has found it useful to seek out and apply technologies largely developed for academic research in the physical sciences. The use of these techniques in transcontinentally distributed, interactive experimentation has been shown to be effective and stable and the analyses of the data find parallels in the behavioral sciences. The authors relate their decade and a half experience in implementing high performance computing hardware, software and user inter-face architectures. These have enabled heretofore unachievable results. They focus on three advances: the use of general purpose graphics processing units as computing accelerators, the efficiencies derived from implementing interest managed routers in distributed systems, and the benefits of effective data management for the voluminous information
Design and process/measurement for immersed element control in a reconfigurable vertically falling soap film
Thesis (S.B.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2007.Includes bibliographical references (p. 24-25).Reinforcement learning has proven successful at harnessing the passive dynamics of underactuated systems to achieve least energy solutions. However, coupled fluid-structural models are too computationally intensive for in-the-loop control in viscous flow regimes. My vertically falling soap film will provide a reconfigurable experimental environment for machine learning controllers. The real-time position and velocity data will be collected with a High Speed Video system, illuminated by a Low Pressure Sodium Lamp. Approximating lines of interference within the soap film to known pressure variations, controllers will shape downstream flow to desired conditions. Though accurate measurement still eludes those without Laser Doppler Velocimetry, order of magnitude Reynolds numbers can be estimated to describe the regime of controller inquiry.by John Glowa.S.B
- …