This paper introduces a new logic transformation that integrates retiming with algebraic and Boolean transformations at the technology-independent level. It o ers an additional degree of freedom in sequential network optimization resulting from implicit retiming across logic blocks and fanout stems. The application of this transformation to sequential network synthesis results in the optimization of logic across register boundaries. We have implemented our new technique within the SIS framework and demonstrated its e ectiveness in terms of cycle-time minimization on a set of sequential benchmark circuits.
INTRODUCTION
Over the years, sequential circuit synthesis has been a subject of intensive investigation. Though synthesis of combinational logic has attained a signicant level of maturity, sequential circuit synthesis is lagging behind. In current state of a airs, sequential networks are rst optimized by applying combinational network transformations to the logic between the register boundaries, and mapped into the gate-level network. The resulting network is then often optimized by applying retiming transformation 8] .
Retiming is the process of relocating the registers across logic gates without a ecting the underlying combinational logic structure. It can be used to minimize cycle-time or the number of registers under the cycle-time constraint.
While in principle retiming can be applied at various levels of synchronous system design, it has been traditionally used as a structural transformation in gate-level circuit optimization. As such, gate-level retiming exploits only one degree of freedom in circuit optimization, namely, the relocation of registers. Furthermore, gate-level retiming does not take into account the prospective logic simpli cation. Potential for the optimization by subsequent re-synthesis is very limited, as it is typically applied to the logic between register boundaries.
In this paper we investigate the issue of retiming at the technology independent level. We introduce a novel and e cient approach to synthesis and optimization of synchronous sequential circuits, in which retiming is performed implicitly during logic optimization.
There have been several attempts to combine retiming with algebraic network transformations in the quest to optimize the logic across register boundaries. Peripheral retiming introduced by Malik et. al. 10] optimizes the underlying combinational logic after a temporary relocation of registers to the periphery of the circuit. It su ers from a limited mobility of registers during the peripheral movement phase. DeMicheli 2] introduced a concept of synchronous divisors and used it in local logic optimization across the register boundaries. Lin 9] formalized the theory for synchronous extraction to detect potential common divisors. Both methods operate on the structural specication of a synchronous circuit, and do not take into account the prospective logic simpli cation during synchronous division. Dey et al 3] proposed a method to improve e ectiveness of retiming by attempting to eliminate retiming bottlenecks. Chakradhar et al. 1] introduced special timing constraints which are used to resynthesize the circuit. The modi ed circuit is subsequently retimed, and the constraints (if satis ed by the delay optimizer) guarantee that the circuit is retimable and meets the desired cycle time.
Retiming has been also used in the context of minimizing latency (rather than clock period) in pipelined circuits. A number of papers addressed a problem of combining retiming with architectural and structural transformations to minimize the latency and/or throughput. The scheme proposed by Potkonjak et al. 11] uses retiming to enable algebraic transformations that can further improve latency/throughput. Hassoun et al 5] introduced a concept of architectural retiming which attempts to increase the number of registers on a latency-constrained path, without increasing the overall latency. These seemingly contradicting goals are achieved by implementing \negative" registers using precomputation and prediction techniques. In the process, the circuit is structurally modi ed to preserve its functionality.
Most of the techniques mentioned above operate on a structural representation of the network. The cost function that guides retiming in network optimization does not take into account the potential for subsequent logic simpli cation. In contrast, our approach takes into account the e ect of retiming on logic simpli cation. It operates directly on a functional speci cation Preliminaries 3 given in terms of synchronous Boolean expressions. It is an iterative synthesis process which integrates retiming with extraction, collapsing, and node simpli cation, into one synchronous transformation. It e ciently handles retiming across fanout stems while preserving initial state. It also provides a simple method to compute initial state of the resynthesized circuit, consistent with the original network speci cation.
PRELIMINARIES
A Boolean function, F, of n variables is a mapping f : B n ?! B, where B = f0; 1g. A literal is a Boolean variable or its complement. A cube is de ned as a product of literals. The support of a Boolean function is de ned as a set of all variables that appear in the function. An expression is said to be cube-free when it cannot be factored by a cube. A kernel of an expression is a cube-free quotient of the expression divided by a cube. Extraction is the process of factoring out a subexpression from one or more logic functions of a network and creating a new node for the extracted expression. Collapsing or elimination is the process of (re)expressing a Boolean function representing a node in the logic network in terms of the support variables of its fanin node.
Forward retiming is the operation of shifting the registers from the inputs to the outputs of a node in a Boolean network; backward retiming is the reverse operation. A node in the network can represent an arbitrary Boolean function. It has been shown that such a transformation preserves the behavior of the circuit 8]. Forward and backward retiming transformations are illustrated in Fig. 1 a) . A node is said to be forward (backward) retimable if each of its input (output) edges contain a register. Retiming across a fanout stem is the operation of forward retiming of a multiple-fanout register across its fanout stem. Retiming across a fanout stem imposes an equivalence relation on the fanout registers. All network transformations and initial state computation, must take into account this register equivalence. An expression is called a retimable expression if all the variables in its support set are register variables. The variables r i and R i can be viewed as inputs and outputs, respectively, of the combinational part of the sequential network, with registers providing feedback paths.
THEORY AND ALGORITHMS
Traditional retiming across a logic gate in a gate-level network (or across a node in a Boolean network) can be extended to a retiming across an arbitrary subexpression of the original logic function. Such a retiming, combined with the extraction of a suitable expression, forms the basis of our new sequential transformation. We shall refer to it as the logic retiming transformation, for lack of a better term. The following sections describe the operations involved in logic retiming. variables, this expression is forward retimable. Forward retiming across V x5 leads to the creation of a new register represented by variables (R 4 ; r 4 ). After the retiming, the expression for R 4 is then given in terms of register input variables R i , as illustrated in Fig. 2 b) .
Retime Extraction
This transformation can be expressed as a new operation, called retimeextraction, which is the basis of our logic retiming transformation. For a given retimable expression k r , the following steps implement retime-extraction:
1. For every node f i of the network, containing expression k r , substitute the expression with a variable r k . 2. Introduce a new node corresponding to k r expressed in terms of register input variables, R i . Represent it by register function R k . 3. Introduce a new register (R k ; r k ).
It should be emphasized that, whenever the register variables in the support of retimable expression k r fan out to other functions, the retime-extract operation involves implicit retiming across fanout stems. In our example this applies to registers R 2 ; R 3 which have multiple fanouts. Consequently, a set of equivalence relations will be imposed on these registers and used in the subsequent logic simpli cation. On the other hand, if a register involved in the retime-extraction fans out only to the retimable expression, it will be rendered redundant by the transformation and subsequently removed. In our example, R 1 fans out only to the retime-extracted expression, and can be removed along with the associated logic function (Fig. 4) .
Collapsing and Simpli cation
The next step is to collapse the node represented by a new variable R k into its fanin nodes, as shown in Fig. 3 . The resulting expression is then simpli ed. Notice the implicit duplication of logic, necessary to perform the collapsing and simpli cation. This ensures that the functionality of the rest of the network remains unchanged. In our case, logic for R 1 ; R 2 ; R 3 is duplicated (see the area marked by the dotted line). The simpli cation is possible, in e ect, due to register equivalence imposed on fanout registers. For simplicity, in all the gures, we use the same variable name for each of the registers obtained after retiming across a fanout.
In our case the collapsing and simpli cation lead to the following expression: R 4 = R 1 R 2 + R 3 = (r 4 i 2 )(i 1 r 2 ) + (i 2 + i 1 r 3 ) = i 2 + i 1 r 3
The simpli ed Boolean expression for R k is also referred to as a retimeexpression RE(k r ). It can be calculated for every retimable kernel or cube k r using the above procedure. The computation of RE(k r ) is central to the logic Figure 3 Collapsing of R 4 into its fanin nodes retiming transformation. In our example, the expressions associated with V x5 and V x4 are the same (i 2 +i 1 r 3 ) and hence V x5 can be removed, as shown in Fig.  4(a) . Finally, notice that register function R 1 is not used. This is because the register disappeared as a result of retime extraction across r 1 r 2 +r 3 . Therefore, the combinational logic function associated with the register function can be deleted. The resulting network is shown in Fig. 4(b) . Furthermore, since the register functions R 3 ; R 4 are identical, the two registers could be merged into one, provided that their initial states are identical, i.e., r 0 3 = r 0 4 . Whether this is possible or not, depends on the initial conditions imposed on the network (see the next section on initial state computation). 
Cost Modeling
Logic retiming is characterized by several important properties, which can be illustrated conceptually in Fig. 5 . First, it can be shown that logic retiming does not degrade the overall cycle-time under the unit-delay model. Since the retime-expression node RE(k r ) is obtained by means of collapsing and simpli cation, it will always be appended to the network at the same (last) level as the nodes that are collapsed into it. By de nition, the arrival time at the output of this node will be no greater than the latest arrival time in the rest of the network. Hence adding a retime-expression node will not increase the topological longest path under this model. Realistically, since retime-extraction may increase fanout on some of the nodes (for example node i 1 in Fig. 4) , the critical path delay could actually increase. This may happen, for example, when a node on a critical path fans out to the newly created node, RE(k r ) (see node V 1 in the gure). This problem can be identi ed by considering an augmented delay model which takes the fanout factor into consideration. Finally, observe that the complexity of a node (measured e.g. in the number of literals) that is a ected by retime-extraction will always be reduced by the extraction of the retimable expression (see node V in the gure). Since it can be argued that the complexity of a node re ects to a certain degree its delay, the delay of the critical path will be reduced, provided that retime-extraction The key element to the e ciency of logic retiming is accurate estimation of the cost associated with a given retimable expression. Fig. 6 illustrates the idea of cost estimation based on simple literal count. It is important to note that the two candidate nodes, k r and RE(k r ), are not yet part of the network. Two gains are computed: x for standard extraction, and r for retime-extraction.
The literal counts of nodes V 1,V 2,V 3 are computed before extraction or retime-extraction; these include the literals of k r . Retime-extraction (which results in the addition of node RE(k r )) is performed if r < x. Also note that while this approach emphasizes the delay, it can also be used to target the logic area (approximated by the total number of literals). The gain in area can be computed by comparing r with y = lit count(k r ).
Depending on the depth of collapsing and the amount of logic simpli cation, r may be greater or smaller than y.
The initial experiments have shown that even this simplistic gain metric can result in cycle-time reduction. A more accurate approach, currently being considered, involves a fast node decomposition using Time-Driven Cofactoring (TDC) 4].
Logic Retiming Algorithm
Logic retiming is an iterative operation comprised of the following steps: 
Comparison with Extraction and Gate-level Retiming
The following example illustrates that logic retiming can lead to circuit optimization (both in terms of delay and area) that is not possible with conventional multi-level synthesis or gate-level retiming alone, see 
IMPLEMENTATION AND EXPERIMENTAL RESULTS
We implemented the logic retiming transformation within the SIS framework. The implementation of the logic retiming algorithm involves the generation of common subexpressions of the combinational part of the network; these common subexpressions are generated using the rectangle intersection algorithm used in SIS. Only those kernels whose value exceeds the user-de ned threshold are selected. For each of the selected kernels we compare the regular extraction value with the retime-extraction value using the gain estimation technique.
The cost function, used in our preliminary experiments, is the number of literals in the SOP form, as discussed in section 3.4. Although simplistic, this cost function allowed us to quickly validate the theory. This version of logic retiming yielded delay improvements over the regular extraction transformation.
Research is now focused on the application of the concept of retime-extraction to the transformations used in script.delay and other delay optimization techniques such as speed up. We are also investigating the application of logic retiming to area minimization under cycle-time constraints.
We have tested our technique on a number of sequential circuits from the ISCAS'91 benchmark set. The circuits were input as logic networks in blif format, its local functions (nodes) were collapsed into SOP form. Each circuit was then optimized using logic retiming and independently synthesized with standard SIS multi-level optimization. The circuits were resynthesized and mapped into the standard SIS lib2.genlib library. The script used for logic retiming is identical to the script with conventional SIS transformations, except that the gkx command has been replaced by the \retime kernel extract" (rkx) command of logic retiming. The general structure of the scripts used in our experiments is given below: The results are reported in Table 1 , which compares the clock-cycle delay, number of registers, and area overhead of the circuits obtained by the two ows. The delay was computed using the mapped delay model. Those circuits which did not contain any retimable kernels are not shown in the table.
Even though our initial implementation of logic retiming used a simplistic
