6,246 research outputs found

    An improved generalization of mesh-connected computers with multiple buses

    Get PDF
    ©2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.Mesh-connected computers (MCCs) are a class of important parallel architectures due to their simple and regular interconnections. However, their performances are restricted by their large diameters. Various augmenting mechanisms have been proposed to enhance the communication efficiency of MCCs. One major approach is to add nonconfigurable buses for improved broadcasting. A typical example is the mesh-connected computer with multiple buses (MMB). We propose a new class of generalized MMBs, the improved generalized MMBs (IMMBs). We compare IMMBs with MMBs and a class of previously proposed generalized MMBs (GMMBs). We show the power of IMMBs by considering semigroup and prefix computations. Specifically, as our main result we show that for any constant 0<&epsiv;<1, one can construct an N½×N½ square IMMB using which semigroup and prefix computations on N operands can be carried out in O(N&epsiv;) time, while maintaining O(1) broadcasting time. Compared with the previous best complexities O(N&frac18;) and O(N&frac116;) achieved on a rectangular MMB and GMMB, respectively, for the same computations, our results show that IMMBs are more powerful than MMBs and GMMBsYi Pen; Zheng, S.Q.; Keqin Li; Hong She

    A fast parallel algorithm for special linear systems of equations using processor arrays with reconfigurable bus systems

    Get PDF
    A parallel algorithm using Processor Arrays with Reconfigurable Bus Systems has been designed to solve dense Symmetric Positive Definite (SPD) systems of equations Ax = b. The key content of this report is the parallelisation of the algorithm by Delosme & Ipson [8]. In order to design a parallel algorithm for PARBS, many procedures involved in [8] are handled in a slightly different way. The parallel time and processor’s complexity of each step of the algorithm is calculated. The parallel time complexity is O(n) using 2n × 2n × 5n number of Processing Elements

    Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code

    Get PDF
    Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance. In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented

    Floating-Point Matrix Product on FPGA

    Get PDF
    This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.---- Copyright IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE

    Constant-Time Algorithms for Minimum Spanning Tree and Related Problems on Processor Array with Reconfigurable Bus Systems

    Get PDF
    [[abstract]]A processor array with a reconfigurable bus system is a parallel computation model that consists of a processor array and a reconfigurable bus system. In this paper, a constant-time algorithm is proposed on this model for finding the cycles in an undirected graph. We can use this algorithm to decide whether a specified edge belongs to the minimum spanning tree of the graph or not. This cycle-finding algorithm is designed on a two-dimensional n×nn\times n processor array with a reconfigurable bus system, where nn is the number of vertices in the graph. Based on this cycle-finding algorithm, the minimum spanning tree problem and the spanning tree problem can be solved in O(1) time by using fewer processors than before, O(n×m×nn\times m\times n) and O(n3n^3) processors respectively. This is a substantial improvement over previous known results. Moreover, we also propose two constant-time algorithms for solving the minimum spanning tree verification problem and spanning tree verification problem by using O(n3n^3) and O(n2n^2) processors, respectively.
    corecore