39 research outputs found

    Tight coupling of timing driven placement and retiming

    Get PDF
    Retiming is a widely investigated technique for performance optimization. In general, it performs extensive modifications on a circuit netlist, leaving it unclear, whether the achieved performance improvement will still be valid after placement has been performed. This paper presents an approach for integrating retiming into a timing-driven placement environment. The experimental results show the benefit of the proposed approach on circuit performance in comparison with design flows using retiming only as a pre- or postplacement optimization method

    Placement driven retiming with a coupled edge timing model

    Get PDF
    Retiming is a widely investigated technique for performance optimization. It performs powerful modifications on a circuit netlist. However, often it is not clear, whether the predicted performance improvement will still be valid after placement has been performed. This paper presents a new retiming algorithm using a highly accurate timing model taking into account the effect of retiming on capacitive loads of single wires as well as fanout systems. We propose the integration of retiming into a timing-driven standard cell placement environment based on simulated annealing. Retiming is used as an optimization technique throughout the whole placement process. The experimental results show the benefit of the proposed approach. In comparison with the conventional design flow based on standard FEAS our approach achieved an improvement in cycle time of up to 34% and 17% on the average

    A Unifying Framework for Systolic Designs

    Get PDF

    Systematic design of two level pipelined systolic arrays with data contraflow

    Get PDF
    Many systolic algorithms and related design methodologies have been recently proposed. Frecuently, in these systolic algorithms practical considerations are not taken into account. Equitatively distributed load between processing elements, pipelined functional units etc, are desirable features when implementing systolic algorithms.In this paper we present a design methodology in which these features are considered. As an example, the methodology is applied to obtain a problem-size-independent, two-level pipelined 1D systolic algorithm with data contraflow to efficiently solve triangular systems of equations.Peer ReviewedPostprint (published version

    On synthesizing systolic arrays from recurrence equations with linear dependencies

    Get PDF
    Journal ArticleWe present a technique for synthesizing systolic architectures from Recurrence Equations. A class of such equations (Recurrence Equations with Linear Dependencies) is defined and and the problem of mapping such equations onto a two dimensional architecture is studied. We show that such a mapping is provided by means of a linear allocation and timing function. An important result is that under such a mapping the dependencies remain linear. After obtaining a two-dimensional architecture by applying such a mapping, a systolic array can be derived if t h e communication can be spatially and temporally localized. We show that a simple test consisting of finding the zeroes of a matrix is sufficient to determine whether this localization can be achieved by pipelining and give a construction that generates the array when such a pipelining is possible. The technique is illustrated by automatically deriving a well known systolic array for factoring a band matrix into lower and upper triangular factors

    Systolic array synthesis by static analysis of program dependencies

    Get PDF
    Journal ArticleWe present a technique for mapping recurrence equations to systolic arrays. While this problem has been studied in fairly great detail, the recurrence equations that are analysed here are a generalization of those studied previously. In a n earlier paper (14] we have showed how systolic arrays can be synthesized from such generalized recurrence equations by a combination of affine transformations and explicit pipelining. This paper extends the results in two directions. Firstly, a multistage pipelining technique is proposed, which permits the synthesis of systolic arrays with irregular data flow. Secondly we develop analysis techniques for the synthesis of systolic arrays whose computation is governed by control signals in a systematic manner which is amenable to mechanization. The full paper also discusses how these techniques can be applied to the mapping problem for more general architectures

    Systolic convolution of arithmetic functions

    Get PDF
    AbstractGiven two arithmetic functions f and g, their convolution h=f∗g is defined as h(n)=ÎŁkl=n,1â©œk,lâ©œnf(k)g(l) for all nâȘ–1. Given two arithmetic functions g and h, the inverse convolution problem is to determine f such that f∗g=h.In this paper, we propose two linear arrays for the real-time computation of the convolution and the inverse convolution problem. These arrays extend the design of Verhoeff for the computation of the Möbius function ÎŒ, defined as the solution of the inverse convolution problem Ό∗g=Δ, where g(n)=1 for all nâ©Ÿ1 and Δ(n)=1 if n=1, Δ(n)=0 if n>1

    A Decomposition Approach for Balancing Large-Scale Acyclic Data Flow Graphs

    Get PDF
    In designing VLSI architectures for a complex computational task, the functional decomposition of the task into a set of computational modules can be represented as a directed task graph, and the inclusion of input data modifies the task graph to an acyclic data flow graph (ADFG). Due to different paths of traveling and computation time of each computational module, operands may arrive at multi-input modules at different arrival times, causing a longer pipelined time. Delay buffers may be inserted along various paths to balance the ADFG to achieve maximum pipelining. This paper presents an efficient decomposition technique which provides a more systematic approach in solving the optimal buffer assignment problem of an ADFG with a large number of computational nodes. The buffer assignment problem is formulated as an integer linear optimization problem which can be solved in pseudo-polynomial time. However, if the size of an ADFG increases, then integer linear constraint equations may grow exponentially, making the optimization problem more intractable. The decomposition approach utilizes the critical path concept to decompose a directed ADFG into a set of connected subgraphs, and the integer linear optimization technique can be used to solve the buffer assignment problem in each subgraph. In other words, a large-scale integer linear optimization problem is divided into a number of smaller-scale subproblems, each of which can be easily solved in pseudo-polynomial time. Examples are given to illustrate the proposed decomposition technique
    corecore