1,016 research outputs found

    The application of genetic algorithms to high-level synthesis

    Get PDF

    Memory Bounds for the Distributed Execution of a Hierarchical Synchronous Data-Flow Graph

    Get PDF
    International audienceThis paper presents an application analysis technique to define the boundary of shared memory requirements of Multiprocessor System-on-Chip (MPSoC) in early stages of development. This technique is part of a rapid prototyping process and is based on the analysis of a hierarchical Synchronous Data-Flow (SDF) graph description of the system application. The analysis does not require any knowledge of the system architecture, the mapping or the scheduling of the system application tasks. The initial step of the method consists of applying a set of transformations to the SDF graph so as to reveal its memory characteristics. These transformations produce a weighted graph that represents the different memory objects of the application as well as the memory allocation constraints due to their relationships. The memory boundaries are then derived from this weighted graph using analogous graph theory problems, in particular the Maximum-Weight Clique (MWC) problem. Stateof-the-art algorithms to solve these problems are presented and a heuristic approach is proposed to provide a near-optimal solution of the MWC problem. A performance evaluation of the heuristic approach is presented, and is based on hierarchical SDF graphs of realistic applications. This evaluation shows the efficiency of proposed heuristic approach in finding near optimal solutions

    Memory Bounds for the Distributed Execution of a Hierarchical Synchronous Data-Flow Graph

    Get PDF
    International audienceThis paper presents an application analysis technique to define the boundary of shared memory requirements of Multiprocessor System-on-Chip (MPSoC) in early stages of development. This technique is part of a rapid prototyping process and is based on the analysis of a hierarchical Synchronous Data-Flow (SDF) graph description of the system application. The analysis does not require any knowledge of the system architecture, the mapping or the scheduling of the system application tasks. The initial step of the method consists of applying a set of transformations to the SDF graph so as to reveal its memory characteristics. These transformations produce a weighted graph that represents the different memory objects of the application as well as the memory allocation constraints due to their relationships. The memory boundaries are then derived from this weighted graph using analogous graph theory problems, in particular the Maximum-Weight Clique (MWC) problem. Stateof-the-art algorithms to solve these problems are presented and a heuristic approach is proposed to provide a near-optimal solution of the MWC problem. A performance evaluation of the heuristic approach is presented, and is based on hierarchical SDF graphs of realistic applications. This evaluation shows the efficiency of proposed heuristic approach in finding near optimal solutions

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    Exploiting parallelism within multidimensional multirate digital signal processing systems

    Get PDF
    The intense requirements for high processing rates of multidimensional Digital Signal Processing systems in practical applications justify the Application Specific Integrated Circuits designs and parallel processing implementations. In this dissertation, we propose novel theories, methodologies and architectures in designing high-performance VLSI implementations for general multidimensional multirate Digital Signal Processing systems by exploiting the parallelism within those applications. To systematically exploit the parallelism within the multidimensional multirate DSP algorithms, we develop novel transformations including (1) nonlinear I/O data space transforms, (2) intercalation transforms, and (3) multidimensional multirate unfolding transforms. These transformations are applied to the algorithms leading to systematic methodologies in high-performance architectural designs. With the novel design methodologies, we develop several architectures with parallel and distributed processing features for implementing multidimensional multirate applications. Experimental results have shown that those architectures are much more efficient in terms of execution time and/or hardware cost compared with existing hardware implementations

    High-level synthesis of VLSI circuits

    Get PDF
    corecore