2,187 research outputs found

    Configuration Sharing Optimized Placement and Routing

    Get PDF
    Reconfigurable systems have been shown to achieve very high computational performance. However, the overhead associated with reconfiguration of hardware remains a critical factor in overall system performance. This paper discusses the development and evaluation of a technique to minimize the delay associated with reconfiguration based upon optimized sharing of configuration bit streams between design contexts. This is achieved through modified placement and routing algorithms

    Throughput-driven floorplanning with wire pipelining

    Get PDF
    The size of future high-performance SoC is such that the time-of-flight of wires connecting distant pins in the layout can be much higher than the clock period. In order to keep the frequency as high as possible, the wires may be pipelined. However, the insertion of flip-flops may alter the throughput of the system due to the presence of loops in the logic netlist. In this paper, we address the problem of floorplanning a large design where long interconnects are pipelined by inserting the throughput in the cost function of a tool based on simulated annealing. The results obtained on a series of benchmarks are then validated using a simple router that breaks long interconnects by suitably placing flip-flops along the wires

    Performance and power optimization in VLSI physical design

    Get PDF
    As VLSI technology enters the nanoscale regime, a great amount of efforts have been made to reduce interconnect delay. Among them, buffer insertion stands out as an effective technique for timing optimization. A dramatic rise in on-chip buffer density has been witnessed. For example, in two recent IBM ASIC designs, 25% gates are buffers. In this thesis, three buffer insertion algorithms are presented for the procedure of performance and power optimization. The second chapter focuses on improving circuit performance under inductance effect. The new algorithm works under the dynamic programming framework and runs in provably linear time for multiple buffer types due to two novel techniques: restrictive cost bucketing and efficient delay update. The experimental results demonstrate that our linear time algorithm consistently outperforms all known RLC buffering algorithms in terms of both solution quality and runtime. That is, the new algorithm uses fewer buffers, runs in shorter time and the buffered tree has better timing. The third chapter presents a method to guarantee a high fidelity signal transmission in global bus. It proposes a new redundant via insertion technique to reduce via variation and signal distortion in twisted differential line. In addition, a new buffer insertion technique is proposed to synchronize the transmitted signals, thus further improving the effectiveness of the twisted differential line. Experimental results demonstrate a 6GHz signal can be transmitted with high fidelity using the new approaches. In contrast, only a 100MHz signal can be reliably transmitted using a single-end bus with power/ground shielding. Compared to conventional twisted differential line structure, our new techniques can reduce the magnitude of noise by 45% as witnessed in our simulation. The fourth chapter proposes a buffer insertion and gate sizing algorithm for million plus gates. The algorithm takes a combinational circuit as input instead of individual nets and greatly reduces the buffer and gate cost of the entire circuit. The algorithm has two main features: 1) A circuit partition technique based on the criticality of the primary inputs, which provides the scalability for the algorithm, and 2) A linear programming formulation of non-linear delay versus cost tradeoff, which formulates the simultaneous buffer insertion and gate sizing into linear programming problem. Experimental results on ISCAS85 circuits show that even without the circuit partition technique, the new algorithm achieves 17X speedup compared with path based algorithm. In the meantime, the new algorithm saves 16.0% buffer cost, 4.9% gate cost, 5.8% total cost and results in less circuit delay

    Interconnect tree optimization algorithm in nanometer very large scale integration designs

    Get PDF
    This thesis proposes a graph-based maze routing and buffer insertion algorithm for nanometer Very Large Scale Integration (VLSI) layout designs. The algorithm is called Hybrid Routing Tree and Buffer insertion with Look-Ahead (HRTB-LA). In recent VLSI designs, interconnect delay becomes a dominant factor compared to gate delay. The well-known technique to minimize the interconnect delay is by inserting buffers along the interconnect wires. In conventional buffer insertion algorithms, the buffers are inserted on the fixed routing paths. However, in a modern design, there are macro blocks that prohibit any buffer insertion in their respective area. Most of the conventional buffer insertion algorithms do not consider these obstacles. In the presence of buffer obstacles, post routing algorithm may produce poor solution. On the other hand, simultaneous routing and buffer insertion algorithm offers a better solution, but it was proven to be NP-complete. Besides timing performance, power dissipation of the inserted buffers is another metric that needs to be optimized. Research has shown that power dissipation overhead due to buffer insertions is significantly high. In other words, interconnect delay and power dissipation move in opposite directions. Although many methodologies to optimize timing performance with power constraint have been proposed, no algorithm is based on grid graph technique. Hence, the main contribution of this thesis is an efficient algorithm using a hybrid approach for multi-constraint optimization in multi-terminal nets. The algorithm uses dynamic programming to compute the interconnect delay and power dissipation of the inserted buffers incrementally, while an effective runtime is achieved with the aid of novel look-ahead and graph pruning schemes. Experimental results prove that HRTB-LA is able to handle multi-constraint optimizations and produces up to 47% better solution compared to a post routing buffer insertion algorithm in comparable runtime

    Placement and routing for reconfigurable systems.

    Get PDF
    Applications using reconfigurable logic have been widely demonstrated to offer better performance over software-based solutions. However, good performance rating is often destroyed by poor reconfiguration latency - time required to reconfigure hardware to perform the new task. Recent research focus on design automation techniques to address reconfiguration latency bottleneck. The contribution to novelty of this thesis is in new placement and routing techniques resulting in minimising reconfiguration latency of reconfigurable systems. This presents a part of design process concerned with positioning and connecting design blocks in a logic gate array. The aim of the research is to optimise the placement and interconnect strategy such that dynamic changes in system functionality can be achieved with minimum delay. A review of previous work in the field is given and the relevant theoretical framework developed. The dynamic reconfiguration problem is analysed for various reconfigurable technologies. Several algorithms are developed and evaluated using a representative set of problem domains to assess their effectiveness. Results obtained with novel placement and routing techniques demonstrate configuration data size reduction leading to significant reconfiguration latency improvements

    Design of testbed and emulation tools

    Get PDF
    The research summarized was concerned with the design of testbed and emulation tools suitable to assist in projecting, with reasonable accuracy, the expected performance of highly concurrent computing systems on large, complete applications. Such testbed and emulation tools are intended for the eventual use of those exploring new concurrent system architectures and organizations, either as users or as designers of such systems. While a range of alternatives was considered, a software based set of hierarchical tools was chosen to provide maximum flexibility, to ease in moving to new computers as technology improves and to take advantage of the inherent reliability and availability of commercially available computing systems

    Scalable and deterministic timing-driven parallel placement for FPGAs

    Full text link
    • ā€¦
    corecore