The power of retiming is often limited by the underlying topology of a computational structure. We combine the power of retiming with a complete set of algebraic transformations in an iterative improvement framework, where retiming and algebraic speed-up algorithms are successively applied, so that the latter enables the former. The key part of the approach is a new algebraic speed-up algorithm being used for the first time in high-level synthesis for transformations of algebraic expressions so that an arbitrary set of input arrival times and output required times are satisfied. Since the new method moves delays forward only and retiming is done locally and very infrequently, it also always calculates the new initial state efficiently. The proposed approach has yielded results better or equal to the best previously published on all benchmark examples and on several novel real-life examples.
For simplicity, we assume that each operation in the FIR filter example takes one control cycle. The initial critical path is 5 control cycles long (see Figure la) . It is easy to see that retiming is not effective on this exsmple, and can not reduce the csitical path. Associativity and commutativity, as used in several 30th ACM/IEEE Design Automation Conference" Pemtission to copy witbout fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of she publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. Figure 2C shows the result of common subexpression elimination on the expression tree. Note that the step results in the identification of the input "e" involved with three multiplication operations and its subsequent transformation to reduce it to a single multiplication operation.
After applying level reduction, distributivity and common subexpression elimination, associativity is applied to restructure the CDFG into two-input operations,. Figure 2d shows the rearrangement of the expression tree in accordance with the associativity laws so as to meet the arrival time constraints of the inputs. In the case of applying the algorithm to 
Software Environment and Experimental

Results
Although there is a close technical relationship between logic synthesis and high-level synthesis, software tools used in those two areas have been developed almost without any interaction. Our goal was to leverage as much as possible on it and avoid software rewriting, so we interfaced tools from those two fields.
ERB takes as an input circuits described using the Berkeley Logic Interchange Format (BLIF) and translates it to a data s~ucture like that of the logic synthesis system S1S [Sen92].
Using a cnode cover algorithm with the S1S speed-up algorithm as an engine, ERB provides an ideal software environment for implementation of a single step of our iterative improvement algorithm. We replaced the logic synthesis speed-up algorithm with our new algebraic speed-up code. Since, both logic synthesis and high-level synthesis software use very similar data structures, writing an interface between two data structures was straightforward.
As the front end, and for simulation we used the HYPER high-level synthesis system obtained from University of 
