1,424 research outputs found

    Formal synthesis of control signals for systolic arrays

    Get PDF

    Time-Optimal and Conflict-Free Mappings of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

    Get PDF
    Most existing methods of mapping algorithms into processor arrays are restricted to the case where n-dimensional algorithms or algorithms with n nested loops are mapped into (n—l)-dimensional arrays. However, in practice, it is interesting to map n-dimensional algorithms into (k —l)-dimensional arrays where k\u3c.n. For example, many algorithms at bit-level are at least 4-dimensional (matrix multiplication, convolution, LU decomposition, etc.) and most existing bit level processor arrays are 2-dimensional. A computational conflict occurs if two or more computations of an algorithm are mapped into the same processor and the same execution time. In this paper, necessary and sufficient conditions are derived to identify all mappings without computational conflicts, based on the Hermite normal form of the mapping matrix. These conditions are used to propose methods of mapping any n-dimensional algorithm into (k— l)-dimensional arrays, kn—3, optimality of the mapping is guaranteed

    Parallelization of dynamic programming recurrences in computational biology

    Get PDF
    The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

    Effective network grid synthesis and optimization for high performance very large scale integration system design

    Get PDF
    制度:新 ; 文部省報告番号:甲2642号 ; 学位の種類:博士(工学) ; 授与年月日:2008/3/15 ; 早大学位記番号:新480

    Preliminary Report on High-Performance Computational Structures for Robot Control

    Get PDF
    In this report we present some initial results of our work completed thus far on Computational Structures for Robot Control . A SIMD architecture with the crossbar interprocessor network which achieves the parallel processing execution time lower bound of o( [a1n ]), where a1 is a constant and n is the number of manipulator joints, for the computation of the inverse dynamics problem, is discussed. A novel SIMD task scheduling algorithm that optimizes the parallel processing performance on the indicated architecture is also delineated. Simulations performed on this architecture show speedup factor of 3.4 over previous related work completed for the evaluation of the specified problem, is achieved. Parallel processing of PUMA forward and inverse kinematics solutions is next investigated using a particular scheduling algorithm. In addition, a custom bit-serial array architecture is designed for the computation of the inverse dynamics problem within the bit-serial execution time lower bound of o(c1k + c2kn), where c1 and c2 are specified constants, k is the word length, and n is the number of manipulator joints. Finally, mapping of the Newton-Euler equations onto a fixed systolic array is investigated. A balanced architecture for the inverse dynamics problem which achieves the systolic execution time lower bound for the specified problem is depicted. Please note again that these results are only preliminary and improvements to our algorithms and architectures are currently still being made

    On the synthesis of integral and dynamic recurrences

    Get PDF
    PhD ThesisSynthesis techniques for regular arrays provide a disciplined and well-founded approach to the design of classes of parallel algorithms. The design process is guided by a methodology which is based upon a formal notation and transformations. The mathematical model underlying synthesis techniques is that of affine Euclidean geometry with embedded lattice spaces. Because of this model, computationally powerful methods are provided as an effective way of engineering regular arrays. However, at present the applicability of such methods is limited to so-called affine problems. The work presented in this thesis aims at widening the applicability of standard synthesis methods to more general classes of problems. The major contributions of this thesis are the characterisation of classes of integral and dynamic problems, and the provision of techniques for their systematic treatment within the framework of established synthesis methods. The basic idea is the transformation of the initial algorithm specification into a specification with data dependencies of increased regularity, so that corresponding regular arrays can be obtained by a direct application of the standard mapping techniques. We will complement the formal development of the techniques with the illustration of a number of case studies from the literature.EPSR

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Partitioning of Uniform Dependency Algorithms for Parallel Execution on MIMD/ Systolic Systems

    Get PDF
    An algorithm can be modeled as an index set and a set of dependence vectors. Each index vector in the index set indexes a computation of the algorithm. If the execution of a computation depends on the execution of another computation, then this dependency is represented as the difference between the index vectors of the computations. The dependence matrix corresponds to a matrix where each column is a dependence vector. An independent partition of the index set is such that there are no dependencies between computations that belong to different blocks of the partition. This report considers uniform dependence algorithms with any arbitrary kind of index set and proposes two very simple methods to find independent partitions of the index set. Each method has advantages over the other one for certain kind of application, and they both outperform previously proposed approaches in terms of computational complexity and/or optimality. Also, lower bounds and upper bounds of the cardinality of the maximal independent partitions are given. For some algorithms it is shown that the cardinality of the maximal partition is equal to the greatest common divisor of some subdeterminants of the dependence matrix. In an MIMD/multiple systolic array computation environment, if different blocks of ail independent partition are assigned to different processors/arrays, the communications between processors/arrays will be minimized to zero. This is significant because the communications usually dominate the overhead in MIMD machines. Some issues of mapping partitioned algorithms into MIMD/systolic systems are addressed. Based on the theory of partitioning, a new method is proposed to test if a system of linear Diophantine equations has integer solutions
    corecore