307 research outputs found

    Unifying mesh- and tree-based programmable interconnect

    Get PDF
    We examine the traditional, symmetric, Manhattan mesh design for field-programmable gate-array (FPGA) routing along with tree-of-meshes (ToM) and mesh-of-trees (MoT) based designs. All three networks can provide general routing for limited bisection designs (Rent's rule with p<1) and allow locality exploitation. They differ in their detailed topology and use of hierarchy. We show that all three have the same asymptotic wiring requirements. We bound this tightly by providing constructive mappings between routes in one network and routes in another. For example, we show that a (c,p) MoT design can be mapped to a (2c,p) linear population ToM and introduce a corner turn scheme which will make it possible to perform the reverse mapping from any (c,p) linear population ToM to a (2c,p) MoT augmented with a particular set of corner turn switches. One consequence of this latter mapping is a multilayer layout strategy for N-node, linear population ToM designs that requires only /spl Theta/(N) two-dimensional area for any p when given sufficient wiring layers. We further show upper and lower bounds for global mesh routes based on recursive bisection width and show these are within a constant factor of each other and within a constant factor of MoT and ToM layout area. In the process we identify the parameters and characteristics which make the networks different, making it clear there is a unified design continuum in which these networks are simply particular regions

    Packet Switched vs. Time Multiplexed FPGA Overlay Networks

    Get PDF
    Dedicated, spatially configured FPGA interconnect is efficient for applications that require high throughput connections between processing elements (PEs) but with a limited degree of PE interconnectivity (e.g. wiring up gates and datapaths). Applications which virtualize PEs may require a large number of distinct PE-to-PE connections (e.g. using one PE to simulate 100s of operators, each requiring input data from thousands of other operators), but with each connection having low throughput compared with the PE’s operating cycle time. In these highly interconnected conditions, dedicating spatial interconnect resources for all possible connections is costly and inefficient. Alternatively, we can time share physical network resources by virtualizing interconnect links, either by statically scheduling the sharing of resources prior to runtime or by dynamically negotiating resources at runtime. We explore the tradeoffs (e.g. area, route latency, route quality) between time-multiplexed and packet-switched networks overlayed on top of commodity FPGAs. We demonstrate modular and scalable networks which operate on a Xilinx XC2V6000-4 at 166MHz. For our applications, time-multiplexed, offline scheduling offers up to a 63% performance increase over online, packet-switched scheduling for equivalent topologies. When applying designs to equivalent area, packet-switching is up to 2× faster for small area designs while time-multiplexing is up to 5× faster for larger area designs. When limited to the capacity of a XC2V6000, if all communication is known, time-multiplexed routing outperforms packet-switching; however when the active set of links drops below 40% of the potential links, packet-switched routing can outperform time-multiplexing

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Center for Aeronautics and Space Information Sciences

    Get PDF
    This report summarizes the research done during 1991/92 under the Center for Aeronautics and Space Information Science (CASIS) program. The topics covered are computer architecture, networking, and neural nets

    Graph Contraction for Mapping Data on Parallel Computers: A Quality–Cost Tradeoff

    Get PDF

    Efficient Molecular Dynamics Simulation on Reconfigurable Models with MultiGrid Method

    Get PDF
    In the field of biology, MD simulations are continuously used to investigate biological studies. A Molecular Dynamics (MD) system is defined by the position and momentum of particles and their interactions. The dynamics of a system can be evaluated by an N-body problem and the simulation is continued until the energy reaches equilibrium. Thus, solving the dynamics numerically and evaluating the interaction is computationally expensive even for a small number of particles in the system. We are focusing on long-ranged interactions, since the calculation time is O(N^2) for an N particle system. In this dissertation, we are proposing two research directions for the MD simulation. First, we design a new variation of Multigrid (MG) algorithm called Multi-level charge assignment (MCA) that requires O(N) time for accurate and efficient calculation of the electrostatic forces. We apply MCA and back interpolation based on the structure of molecules to enhance the accuracy of the simulation. Our second research utilizes reconfigurable models to achieve fast calculation time. We have been working on exploiting two reconfigurable models. We design FPGA-based MD simulator implementing MCA method for Xilinx Virtex-IV. It performs about 10 to 100 times faster than software implementation depending on the simulation accuracy desired. We also design fast and scalable Reconfigurable mesh (R-Mesh) algorithms for MD simulations. This work demonstrates that the large scale biological studies can be simulated in close to real time. The R-Mesh algorithms we design highlight the feasibility of these models to evaluate potentials with faster calculation times. Specifically, we develop R-Mesh algorithms for both Direct method and Multigrid method. The Direct method evaluates exact potentials and forces, but requires O(N^2) calculation time for evaluating electrostatic forces on a general purpose processor. The MG method adopts an interpolation technique to reduce calculation time to O(N) for a given accuracy. However, our R-Mesh algorithms require only O(N) or O(logN) time complexity for the Direct method on N linear R-Mesh and N¡¿N R-Mesh, respectively and O(r)+O(logM) time complexity for the Multigrid method on an X¡¿Y¡¿Z R-Mesh. r is N/M and M = X¡¿Y¡¿Z is the number of finest grid points
    • …
    corecore