6,478 research outputs found

    Placement driven retiming with a coupled edge timing model

    Get PDF
    Retiming is a widely investigated technique for performance optimization. It performs powerful modifications on a circuit netlist. However, often it is not clear, whether the predicted performance improvement will still be valid after placement has been performed. This paper presents a new retiming algorithm using a highly accurate timing model taking into account the effect of retiming on capacitive loads of single wires as well as fanout systems. We propose the integration of retiming into a timing-driven standard cell placement environment based on simulated annealing. Retiming is used as an optimization technique throughout the whole placement process. The experimental results show the benefit of the proposed approach. In comparison with the conventional design flow based on standard FEAS our approach achieved an improvement in cycle time of up to 34% and 17% on the average

    Relaxing Synchronization in Distributed Simulated Annealing

    Get PDF
    Simulated annealing is an attractive, but expensive, heuristic for approximating the solution to combinatorial optimization problems. Since simulated annealing is a general purpose method, it can be applied to the broad range of NP-complete problems such as the traveling salesman problem, graph theory, and cell placement with a careful control of the cooling schedule. Attempts to parallelize simulated annealing, particularly on distributed memory multicomputers, are hampered by the algorithm’s requirement of a globally consistent system state. In a multicomputer, maintaining the global state S involves explicit message traffic and is a critical performance bottleneck. One way to mitigate this bottleneck is to amortize the overhead of these state updates over as many parallel state changes as possible. By using this technique, errors in the actual cost C(S) of a particular state S will be introduced into the annealing process. This dissertation places analytically derived bounds on the cost error in order to assure convergence to the correct result. The resulting parallel Simulated Annealing algorithm dynamically changes the frequency of global updates as a function of the annealing control parameter, i.e. temperature. Implementation results on an Intel iPSC/2 are reported

    A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

    Get PDF
    A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm

    Modeling of a hardware VLSI placement system: Accelerating the Simulated Annealing algorithm

    Get PDF
    An essential step in the automation of electronic design is the placement of the physical components on the target semiconductor die. The placement step presents the opportunity to reduce costs in terms of wire length and performance degradation; however it is compute intensive and is NP-complete in terms of obtaining an optimal solution. As designs have grown in complexity and gate count, obtaining an optimal solution is not feasible due to time to market constraints or sheer compute effort required. Heuristic algorithms allow for efficient but sub-optimal designs to be produced with a reduction in processing time. A widely used algorithm is Simulated Annealing (SA). The goal of this work was to develop a model that would enable an analysis into the feasibility of developing a hardware accelerated placement system which uses SA at its core. The SA heuristic was analyzed for possible improvements in efficiency with focus given to targeting the system for hardware. A solution implementing parallel computing with specialized hardware configurations inside a field programmable gate array (FPGA) was investigated as having the possibility to improve the efficiency of the SA-based algorithm. All supporting subsystems were also described for a hardware accelerated model. A large speedup was analytically shown from both accelerating the critical path of the SA algorithm as well as novel methods of improving SA\u27s efficiency. As data throughput requirements were not included in this work, the results presented may be optimistic for an overall system speedup. However, the results clearly show that future work is warranted in studying the concept of a hardware accelerated placement system

    Graphics Processing Unit-Based Computer-Aided Design Algorithms for Electronic Design Automation

    Get PDF
    The electronic design automation (EDA) tools are a specific set of software that play important roles in modern integrated circuit (IC) design. These software automate the design processes of IC with various stages. Among these stages, two important EDA design tools are the focus of this research: floorplanning and global routing. Specifically, the goal of this study is to parallelize these two tools such that their execution time can be significantly shortened on modern multi-core and graphics processing unit (GPU) architectures. The GPU hardware is a massively parallel architecture, enabling thousands of independent threads to execute concurrently. Although a small set of EDA tools can benefit from using GPU to accelerate their speed, most algorithms in this field are designed with the single-core paradigm in mind. The floorplanning and global routing algorithms are among the latter, and difficult to render any speedup on the GPU due to their inherent sequential nature. This work parallelizes the floorplanning and global routing algorithm through a novel approach and results in significant speedups for both tools implemented on the GPU hardware. Specifically, with a complete overhaul of solution space and design space exploration, a GPU-based floorplanning algorithm is able to render 4-166X speedup, while achieving similar or improved solutions compared with the sequential algorithm. The GPU-based global routing algorithm is shown to achieve significant speedup against existing state-of-the-art routers, while delivering competitive solution quality. Importantly, this parallel model for global routing renders a stable solution that is independent from the level of parallelism. In summary, this research has shown that through a design paradigm overhaul, sequential algorithms can also benefit from the massively parallel architecture. The findings of this study have a positive impact on the efficiency and design quality of modern EDA design flow

    Parallelization of Stochastic Evolution

    Get PDF
    The complexity involved in VLSI design and its sub-problems has always made them ideal application areas for non-eterministic iterative heuristics. However, the major drawback has been the large runtime involved in reaching acceptable solutions especially in the case of multi-objective optimization problems. Among the acceleration techniques proposed, parallelization of iterative heuristics is a promising one. The motivation for Parallel CAD include faster runtimes, handling of larger problem sizes, and exploration of larger search space. In this work, the development of parallel algorithms for Stochastic Evolution, applied on multi-objective VLSI cell-placement problem is presented. In VLSI circuit design, placement is the process of arranging circuit blocks on a layout. In standard cell design, placement consists of determining optimum positions of all blocks on the layout to satisfy the constraint and improve a number of objectives. The placement objectives in our work are to reduce power dissipation and wire-length while improving performance (timing). The parallelization is achieved on a cluster of workstations interconnected by a low-latency network, by using MPI communication libraries. Circuits from ISCAS-89 are used as benchmarks. Results for parallel Stochastic Evolution are compared with its sequential counterpart as well as with the results achieved by parallel versions of Simulated Annealing as a reference point for both, the quality of solution as well as the execution time. After parallelization, linear and super linear speed-ups were obtained, with no degradation in quality of the solution

    Parallelization of Stochastic Evolution

    Get PDF
    The complexity involved in VLSI design and its sub-problems has always made them ideal application areas for non-eterministic iterative heuristics. However, the major drawback has been the large runtime involved in reaching acceptable solutions especially in the case of multi-objective optimization problems. Among the acceleration techniques proposed, parallelization of iterative heuristics is a promising one. The motivation for Parallel CAD include faster runtimes, handling of larger problem sizes, and exploration of larger search space. In this work, the development of parallel algorithms for Stochastic Evolution, applied on multi-objective VLSI cell-placement problem is presented. In VLSI circuit design, placement is the process of arranging circuit blocks on a layout. In standard cell design, placement consists of determining optimum positions of all blocks on the layout to satisfy the constraint and improve a number of objectives. The placement objectives in our work are to reduce power dissipation and wire-length while improving performance (timing). The parallelization is achieved on a cluster of workstations interconnected by a low-latency network, by using MPI communication libraries. Circuits from ISCAS-89 are used as benchmarks. Results for parallel Stochastic Evolution are compared with its sequential counterpart as well as with the results achieved by parallel versions of Simulated Annealing as a reference point for both, the quality of solution as well as the execution time. After parallelization, linear and super linear speed-ups were obtained, with no degradation in quality of the solution
    corecore