39 research outputs found
Tight coupling of timing driven placement and retiming
Retiming is a widely investigated technique for performance optimization. In general, it performs extensive modifications on a circuit netlist, leaving it unclear, whether the achieved performance improvement will still be valid after placement has been performed. This paper presents an approach for integrating retiming into a timing-driven placement environment. The experimental results show the benefit of the proposed approach on circuit performance in comparison with design flows using retiming only as a pre- or postplacement optimization method
Placement driven retiming with a coupled edge timing model
Retiming is a widely investigated technique for performance optimization. It performs powerful modifications on a circuit netlist. However, often it is not clear, whether the predicted performance improvement will still be valid after placement has been performed. This paper presents a new retiming algorithm using a highly accurate timing model taking into account the effect of retiming on capacitive loads of single wires as well as fanout systems. We propose the integration of retiming into a timing-driven standard cell placement environment based on simulated annealing. Retiming is used as an optimization technique throughout the whole placement process. The experimental results show the benefit of the proposed approach. In comparison with the conventional design flow based on standard FEAS our approach achieved an improvement in cycle time of up to 34% and 17% on the average
Systematic design of two level pipelined systolic arrays with data contraflow
Many systolic algorithms and related design methodologies
have been recently proposed. Frecuently, in these systolic
algorithms practical considerations are not taken into account.
Equitatively distributed load between processing elements,
pipelined functional units etc, are desirable features when
implementing systolic algorithms.In this paper we present a
design methodology in which these features are considered. As
an example, the methodology is applied to obtain a
problem-size-independent, two-level pipelined 1D systolic
algorithm with data contraflow to efficiently solve triangular
systems of equations.Peer ReviewedPostprint (published version
On synthesizing systolic arrays from recurrence equations with linear dependencies
Journal ArticleWe present a technique for synthesizing systolic architectures from Recurrence Equations. A class of such equations (Recurrence Equations with Linear Dependencies) is defined and and the problem of mapping such equations onto a two dimensional architecture is studied. We show that such a mapping is provided by means of a linear allocation and timing function. An important result is that under such a mapping the dependencies remain linear. After obtaining a two-dimensional architecture by applying such a mapping, a systolic array can be derived if t h e communication can be spatially and temporally localized. We show that a simple test consisting of finding the zeroes of a matrix is sufficient to determine whether this localization can be achieved by pipelining and give a construction that generates the array when such a pipelining is possible. The technique is illustrated by automatically deriving a well known systolic array for factoring a band matrix into lower and upper triangular factors
Systolic array synthesis by static analysis of program dependencies
Journal ArticleWe present a technique for mapping recurrence equations to systolic arrays. While this problem has been studied in fairly great detail, the recurrence equations that are analysed here are a generalization of those studied previously. In a n earlier paper (14] we have showed how systolic arrays can be synthesized from such generalized recurrence equations by a combination of affine transformations and explicit pipelining. This paper extends the results in two directions. Firstly, a multistage pipelining technique is proposed, which permits the synthesis of systolic arrays with irregular data flow. Secondly we develop analysis techniques for the synthesis of systolic arrays whose computation is governed by control signals in a systematic manner which is amenable to mechanization. The full paper also discusses how these techniques can be applied to the mapping problem for more general architectures
Systolic convolution of arithmetic functions
AbstractGiven two arithmetic functions f and g, their convolution h=fâg is defined as h(n)=ÎŁkl=n,1â©œk,lâ©œnf(k)g(l) for all nâȘ1. Given two arithmetic functions g and h, the inverse convolution problem is to determine f such that fâg=h.In this paper, we propose two linear arrays for the real-time computation of the convolution and the inverse convolution problem. These arrays extend the design of Verhoeff for the computation of the Möbius function ÎŒ, defined as the solution of the inverse convolution problem ÎŒâg=Î, where g(n)=1 for all nâ©Ÿ1 and Î(n)=1 if n=1, Î(n)=0 if n>1
A Decomposition Approach for Balancing Large-Scale Acyclic Data Flow Graphs
In designing VLSI architectures for a complex computational task, the functional decomposition of the task into a set of computational modules can be represented as a directed task graph, and the inclusion of input data modifies the task graph to an acyclic data flow graph (ADFG). Due to different paths of traveling and computation time of each computational module, operands may arrive at multi-input modules at different arrival times, causing a longer pipelined time. Delay buffers may be inserted along various paths to balance the ADFG to achieve maximum pipelining. This paper presents an efficient decomposition technique which provides a more systematic approach in solving the optimal buffer assignment problem of an ADFG with a large number of computational nodes. The buffer assignment problem is formulated as an integer linear optimization problem which can be solved in pseudo-polynomial time. However, if the size of an ADFG increases, then integer linear constraint equations may grow exponentially, making the optimization problem more intractable. The decomposition approach utilizes the critical path concept to decompose a directed ADFG into a set of connected subgraphs, and the integer linear optimization technique can be used to solve the buffer assignment problem in each subgraph. In other words, a large-scale integer linear optimization problem is divided into a number of smaller-scale subproblems, each of which can be easily solved in pseudo-polynomial time. Examples are given to illustrate the proposed decomposition technique