Optimizing Non-Monotonic Interconnect using Functional Simulation and Logic Restructuring by Stephen M. Plaza et al.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 12, DECEMBER 2008 2107
Optimizing Nonmonotonic Interconnect Using
Functional Simulation and Logic Restructuring
Stephen M. Plaza, Igor L. Markov, Senior Member, IEEE, and Valeria M. Bertacco, Member, IEEE
Abstract—The relatively poor scaling of interconnect in modern
digital circuits necessitates a number of design optimizations,
which must typically be iterated several times to meet the speci-
ﬁed performance objectives. Such iterations are often due to the
difﬁculty of early delay estimation, particularly before placement.
Therefore, effective logic restructuring to reduce interconnect de-
lay has been a major challenge in physical synthesis, a phase dur-
ing which more accurate delay estimates can be ﬁnally gathered.
In this paper, we develop a new approach that enhances modern
high-performance logic synthesis techniques with ﬂexibility and
accuracy in the physical domain. This approach is based on the
following: 1) a novel criterion based on path monotonicity, which
identiﬁes those interconnects that are amenable to optimization
through logic restructuring, and 2) a synthesis algorithm relying
on logic simulation and placement information to identify placed
subcircuits that hold promise for interconnect reduction. Experi-
ments indicate that our techniques ﬁnd optimization opportunities
and improve interconnect delay by 11.7% on average at less than
2% wirelength and area overhead.
Index Terms—Logic simulation, logic synthesis, physical
synthesis.
I. INTRODUCTION
A
S INTERCONNECT contributes an increasingly signiﬁ-
cant fraction of overall circuit delay, the focus of design
methodology is shifting from logic optimization to interconnect
optimization. While this transition has been occurring for over a
decade, meeting performance objectives is becoming more and
more difﬁcult. In recent years, a few successful methodologies
achieved timing closure by combining netlist-level minimiza-
tion in logic synthesis with postplacement physical optimiza-
tions. This family of solutions is known as physical synthesis.
Related strategies, including interconnect buffering [21], gate
sizing [18], and relocation [1], successfully improved delay.
In [8], [11], [17], and [32], postplacement resynthesis achieved
delay improvement with limited placement perturbation, but
these techniques are limited to simple signal substitution trans-
formations. As the major portion of the critical delay is shifting
into interconnect [38], poor design choices during synthe-
sis cannot be easily corrected by limited-scale postplacement
optimizations. Therefore, more accurate delay models have
Manuscript received May 13, 2008; revised July 24, 2008. Current version
published November 19, 2008. This paper was recommended by Associate
Editor G.-J. Nam.
Theauthorsarewith theDepartment of Electrical Engineering and Computer
Science, University of Michigan, Ann Arbor, MI 48109-2121 USA (e-mail:
splaza@umich.edu; imarkov@umich.edu; valeria@umich.edu).
Color versions of one or more of the ﬁgures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identiﬁer 10.1109/TCAD.2008.2006156
been developed to guide logic synthesis. Wire-load models
that estimate delay by considering the capacitive load of each
net were effective until wire capacitance and resistance be-
came predominant. Further knowledge of the impact of place-
ment on wirelength was consequently needed by synthesis
algorithms.
To meet the challenge of performance optimization at the
130-nm technology node and beyond, the traditional design
ﬂow transformed from several discrete optimization phases
(such as logic synthesis followed by place and route) into a
more holistic strategy. In [14], wirelength estimation was in-
corporated in logic synthesis by constructing a highly placeable
netlist with the goal of reducing wire detours. In addition,
topographical information has been used to guide current syn-
thesis tools [40]. Due to the importance and inherent difﬁ-
culty of estimating the impact of placement and routing on
interconnect, researchers suggested the idea of maintaining a
companion placement estimate throughout the logic synthesis
process [10], [15], [26]. However, interconnect-aware logic
transformations are still limited by the accuracy of the estimates
available. Furthermore, guiding logic synthesis by conservative
delay estimates, as in [14], can lead to transformations that do
not improve critical path delay but increase area and power
consumption.
While aggressive logic restructuring using global intercon-
nect information can exploit better estimates later in the de-
sign ﬂow, such accounts have eluded published literature. One
particular complication is that the limited amount of ﬂexibil-
ity found in combinational circuits must be combined with
physical aspects of performance optimization. In this paper,
we introduce a postplacement solution that enables aggressive
optimization while minimizing changes to the physical netlist.
We consider a wide range of changes to the circuit structure
while also tracking their impact on physical parameters.
Our contributions are as follows.
1) A novel metric for efﬁciently identifying nonmonotonic
paths in the circuit, which locates regions where restruc-
turing provides the greatest gains. This metric generalizes
the metric in [4] and considers longer paths.
2) A generic and powerful technique for discovering logic
transformations using functional simulation, which also
facilitates fast reevaluation of physical parameters. Our
technique does not require local equivalence between
the optimized subcircuit and the original one but uses
simulation and satisﬁability to ensure that the circuit’s
functionality is unmodiﬁed.
0278-0070/$25.00 © 2008 IEEE
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 3, 2009 at 12:40 from IEEE Xplore.  Restrictions apply.2108 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 12, DECEMBER 2008
Fig. 1. Resynthesis of a nonmonotonic path can produce much shorter critical
paths and improve routability.
3) A suite of powerful algorithms that efﬁciently exploit a
circuit’s don’t cares and avoid heavyweight techniques
traditionallyusedinlogicsynthesiswhileallowingtighter
integration with placement and a more realistic delay
calculation.
In our methodology, we ﬁrst identify detoured wires that
lie on critical paths. As mentioned in [4], many critical paths
cannot be improved through cell relocation and better timing-
driven placement. Furthermore, the inaccuracy of timing es-
timates before detailed placement limits the effectiveness of
techniques from [14] in eliminating path nonmonotonicity. We
target these nonmonotonic paths for resynthesis by generating
different logic topologies that improve circuit delay. To efﬁ-
ciently ﬁnd these topologies, we abstract away circuit com-
plexity using logic simulation. Through logic simulation, we
partiallycharacterize thebehavior ofeachnode inthesubcircuit
with a signature [7], [8], [28]. We then use these signatures
to determine whether a logic transformation generating the
desired topology is possible.
In the example of Fig. 1, we show that by applying our
technique, a subcircuit with a long critical path can be trans-
formed to a functionally equivalent subcircuit with smaller
critical path delay. Unlike most techniques from logic syn-
thesis, our circuit restructuring can work directly on mapped
circuits with complex standard cells. Another novel feature is
our extensive use of circuit ﬂexibility due to signal masking
by downstream logic, also known as observability don’t cares
(ODCs). Additionally, our approach uses controllability don’t
cares (CDCs), i.e., circuit ﬂexibility due to upstream logic.
Compared to the work in [34], our approach exploits global
don’t cares to enhance logic restructuring. In [17], redundancy
addition and removal (RAR) are used to improve circuit timing.
However, these rewiring techniques consider only a subset of
our transformations, where we use redundancy and physical
information in conjunction to directly guide the resynthesis of
subcircuits containing multiple cells.
Our experiments indicate that large circuits often contain
manylongcriticalpathsthatcanbeeffectivelytargetedwithour
restructuring. Improving these paths results in consistent delay
improvements of 11.7% on average with minimal degradation
to other performance parameters. Furthermore, we achieve
almost twice the delay improvement of that achieved by RAR-
based timing optimizations. Our techniques are fast and scale
to large designs, whereas completely characterizing node func-
tionality with binary decision diagrams (BDDs) would require
a prohibitive memory footprint.
In Section II, we review the use of simulation to guide logic
optimization and summarize state-of-the-art synthesis strate-
gies. In Section III, we introduce our interconnect optimization
strategy. In Section IV, we propose a metric for ﬁnding circuit
paths that require restructuring. Section V introduces a novel
physically aware synthesis approach that uses simulation. Em-
piricalvalidationispresentedinSectionVII,andwesummarize
this paper in Section VIII.
II. BACKGROUND
This section describes how functional simulation can be used
to characterize the behavior of internal nodes in the circuit
and guide logic optimization. We then discuss a state-of-the-
art approach for logic synthesis, currently limited to the logic
domain that provides essential components for our physical
synthesis algorithms.
A. Simulation and Satisﬁability
A node F in a Boolean network can be viewed as a function
of primary inputs. Such a node can be characterized by its
signature SF for K input vectors X1,...,X K.
Deﬁnition 1: SF = {F(X1),...,F(XK)}, where F(Xi)=
{0,1}, indicates the output of F for a given input vector.
A carefully designed testbench or constrained-random simu-
lator can be used to generate vectors Xi and derive signatures
for each node in a circuit. For a network with n nodes, the time
complexity of generating signatures for the whole network is
O(K ∗ n). The functional nonequivalence of two nodes can be
determined by the following: SF  = SG ⇒ F  = G.
Signatures can be efﬁciently created and manipulated by
taking advantage of bit-parallel operations. Therefore, equal
signatures can be used to efﬁciently identify potential node
equivalences in a circuit by deriving a hash index for each
signature [19]. Since SF = SG does not imply that F = G,t h i s
potential equivalence must be veriﬁed, e.g., using a SAT solver,
as explained in the following.
The signature is a partial characterization of a node’s func-
tionality. Furthermore, the signature encodes all of the node’s
CDCs under the input vectors applied. The signature’s partial
characterization enables fast and aggressive optimizations with-
out requiring a fully speciﬁed truth table. However, unlike
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 3, 2009 at 12:40 from IEEE Xplore.  Restrictions apply.PLAZA et al.: OPTIMIZING NONMONOTONIC INTERCONNECT USING SIMULATION AND RESTRUCTURING 2109
Fig. 2. Optimization by merging equivalent nodes in the presence of don’t
cares. 3-b signatures are shown at the output of each gate.
traditional correct-by-construction optimizations, these spec-
ulative transformations must be validated by a formal proof
mechanism. Hence, the efﬁciency of [19] and [23] depends on
the underlying engines which formally verify the equivalence
of nodes with identical signatures.
Recent advances in SAT solvers, e.g., learning, nonchrono-
logical backtracking, and watched literals [24], [30], have made
SAT a more scalable alternative to BDDs for equivalence
checking. The equivalence of two nodes F and G in a network
can be determined by constructing an XOR-based miter [5]
between them and asserting the output to 1, as shown in the
following:
(F = G) ⇔ (∀iF(Xi) ⊕ G(Xi)  =1 ) (1)
where

i Xi is the set of all possible input vectors.
In [19], input vectors are generated dynamically from coun-
terexamples returned by SAT checks proving F  = G.T h e
dynamic input vectors improve the quality of the signatures by
limiting situations where SF = SG even if F  = G.
B. Logic Optimizations With Signatures
Simulation is an effective means for quickly identifying can-
didates for optimization. In [28] and [37], signatures were used
to additionally encode ODCs to enable circuit simpliﬁcation
and optimization by merging equivalent nodes. Consider the
example in Fig. 2 which shows a circuit where logic simulation
produces the signatures shown. Notice that through efﬁcient
don’t-care computation using a fast linear-time simulation [28]
of downstream nodes, don’t-care values can be determined for
some of the signature’s positions. In the example, these don’t
cares suggest a potential circuit simpliﬁcation by merging two
nodes. The optimization will need to be veriﬁed by a formal
proof engine.
Despite these advantages, signature-based optimizations are
limited, and general synthesis algorithms have not been de-
veloped. A key contribution of this work is the application
of signatures to enable logic restructuring while relying on
available don’t-care computation algorithms.
Fig. 3. Two examples of AIG rewriting. With structural hashing, it is possible
in the second example to reuse external nodes and minimize the subgraph.
C. Logic Rewriting
Performing scalable logic optimization requires efﬁcient
netlist manipulation, typically involving only a small set of gate
primitives. Given a set of Boolean expressions that describes a
circuit, the goal of synthesis optimization is to minimize the
number of literals in the expressions along with the number
of logic levels. Several drawbacks of these techniques are
discussed in [22], including limited scalability. To this end,
Mishchenkoetal.[22]introducedanefﬁcientsynthesisstrategy
called rewriting. Logic rewriting is performed over a netlist
representation called an AND-inverter graph (AIG) [19], where
each node represents an AND gate and complemented (dotted)
edges represent inverters. In logic rewriting, the quality of
different functionally equivalent implementations for a small
logic block in a circuit is assessed. In Fig. 3, the top transfor-
mation leads to a reduction in area. By using a technique called
structural hashing [19], nodes in other parts of the circuit can
be reused. In the bottom example, there is a global reduction
in area by reusing available gates. However, structural hashing
requires that the circuit be represented as an AIG and is not
viable on mapped circuits.
The increasing signiﬁcance of wire delay is addressed by
providing more accurate delay models to logic synthesis, from
using wire-load models to maintaining companion placements
[10]. The delay model is used to modify the literal reduction
objective so that transformations or rewrites that improve the
delay according to the model are favored. However, delay esti-
mation is becoming more inaccurate before detailed placement
and routing as the actual interconnect routes become more
important. This trend suggests that new synthesis algorithms
should be applied after placement and routing because specula-
tive optimizations can actually increase delay while negatively
impacting other performance metrics like area.
III. OUR APPROACH
In this paper, we introduce a new synthesis approach that
accounts for physical aspects of performance optimization.
We show our approach in Fig. 4. Starting from a fully placed
circuit, we identify critical paths using static timing analysis.
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 3, 2009 at 12:40 from IEEE Xplore.  Restrictions apply.2110 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 12, DECEMBER 2008
Fig. 4. Our approach to optimizing interconnect. First, we identify nonmonotonic critical path interconnect, and then, we restructure these paths to improve
delay. Such netlist transformations include gate cloning but are substantially more general. They do not require that the subcircuits in question be equivalent.
Instead, they use simulation and satisﬁability to ensure that the entire circuit remains equivalent to the original.
We then apply a novel metric introduced in Section IV that ﬁnds
subcircuits for which restructuring could provide the great-
est improvements. Next, we perform logic simulation using
an even distribution of input vectors and generate signatures
that encode don’t cares to obtain a partial characterization of
the functional behavior of the circuit. Using this functional
information encoded in signatures along with the physical
constraints, we efﬁciently derive a topology that is logically
equivalent to the original subcircuit but exhibits better perfor-
mance. Finally, we legalize the altered placement and update
the timing information in the circuit. As a result, we tailor our
path-monotonicity metric to ﬁnd portions of the critical path,
resulting in the greatest delay improvements. In addition, our
techniques can target other objectives as well.
Using signatures for restructuring is advantageous because
logic simulation provides a more scalable functional repre-
sentation than BDDs. Furthermore, signatures can characterize
internalnodesfornetlistsmappedtostandardcellsaswellasfor
technology-independent netlists. In contrast, the logic rewriting
strategy in [22] does not operate on technology-mapped circuits
and does not take physical information into account. We also
improve solution quality by considering more don’t cares while
being directly guided by physical constraints.
IV. IDENTIFYING NONMONOTONIC PATHS
To maximize the effectiveness of our postplacement opti-
mizations, we target timing critical parts of the design that are
amenable to restructuring. In this section, we introduce our fast
dynamic programming algorithm for ﬁnding paths in logic that
are nonmonotonic or paths that are not optimally short. Unlike
the work in [4], we consider paths of arbitrary lengths and scale
to many more segments in practice. We propose the following
two models for computing path monotonicity: 1) wirelength
based and 2) delay based. Nonmonotonic paths indicate regions
where interconnect and/or delay may be reduced by postplace-
ment optimization.
A. Path Monotonicity
First, static timing analysis is performed to enable our delay-
based monotonicity calculation and identify critical and near-
critical paths. We use a timing analyzer whose interconnect
delay calculation is based on Steiner-tree topologies produced
by FLUTE [12]1 and the D2M delay metric [2] that is known to
be more accurate than Elmore delay. Before focusing on critical
paths, we will describe a general approach that examines the
monotonicity of every path. We deﬁne the nonmonotone factor
(NMF) for the path {x1,...,x k} with respect to a given cost
metric (such as wirelength or delay) as follows:
NMF =
1
cideal(x1,x k)
k−1 
n=1
c(xn,x n+1) (2)
where c(a,b) deﬁnes the actual cost between a and b and cideal
deﬁnes an optimal cost. When NMF =1 , the path is monotonic
under the cost metric. We explore two deﬁnitions for cost, i.e.,
one based on rectilinear distance and another on delay.
For the rectilinear case, c(a,b) is the rectilinear distance
between cell a and b, while cideal(a,b) is the optimal rectilin-
ear distance assuming a monotonic path. For the delay-based
deﬁnition, c(a,b) is the AT(b) − AT(a), where AT is the arrival
time. We deﬁne cideal as the delay of an optimally buffered path
between a and b, as described by Otten and Brayton [25] and
given by the following:
cideal(a,b)=dist(a,b)

RbufC+RCbuf+

2RbufCbufRC

(3)
where R and C are the wire resistance and capacitance and
Rbuf and Cbuf are the intrinsic resistance and input capacitance
of the buffers, respectively. dist(a,b) is the rectilinear distance
between a and b. Unlike the distance calculation where the
ideal path length between a and b can be equal to the actual
path length, the optimal buffered wire between a and b has
delay ≤ AT(b) − AT(a). We only attempt to optimize paths
with large NMFs.
B. Calculating NMFs
We now present our algorithm for calculating the NMF of
all k-hop paths in a circuit, for a given k ≥ 2. Our experiments
indicate that the greatest NMFs are often observed on relatively
short paths, and optimizing such paths brings greatest beneﬁts.
The NMF can be efﬁciently computed for every path using a
O(K ∗ n)-time algorithm for n nodes in the circuit, as shown
1Timing-driven Steiner trees can be easily used instead [3].
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 3, 2009 at 12:40 from IEEE Xplore.  Restrictions apply.PLAZA et al.: OPTIMIZING NONMONOTONIC INTERCONNECT USING SIMULATION AND RESTRUCTURING 2111
Fig. 5. Generating NMFs for a k-hop paths.
Fig. 6. Calculating the NMF for path {d,h}. The matrix shows computations
performed while executing the algorithm in Fig. 5.
in Fig. 5. First, the circuit is levelized. Then, cideal is computed
for node pairings with a connecting path of ≤ k hops, and the
values are stored in c_ideal_array. All pairs are traversed
again, and the subtot is generated by computing the maximum
cost from node1 to node2 through a recurrence relation. The
NMF is computed for the subpath {node1,node2} by dividing
the total cost subtot by c_ideal[node1,node2].I nF i g .6 ,w e
show a subcircuit being traversed using the gen_NMF function,
where k =3and the current node1 is d. The matrix indicates
the NMFs already computed with #, and nodes not lying on the
same path with X. Because we traverse the graph in levelized
order, a, b, and c have already been examined. Notice that
nodes that are farther than k hops away are not examined
(indicated by K in the matrix). For node d, the NMF is com-
puted for path d–h by determining all the incoming subpaths
to h ﬁrst.
V. P HYSICALLY AWARE LOGIC RESTRUCTURING
We optimize the subcircuits that are identiﬁed by the path
monotonicity metric, as shown in Fig. 7. We ﬁrst select a region
of logic determined by the nonmonotonic path for resynthesis.
We then use signatures to ﬁnd an alternative implementation
with a topology that improves physical parameters and that
is logically equivalent to the original implementation (up to
the signatures). This implementation is then formally veriﬁed
by performing SAT-based equivalence checking between the
original and new netlists.
Previous work on improving path monotonicity used logic
replication [16]. However, the technique is restricted to the
topology of the extracted subcircuit, and its optimization is
independent of the subcircuit’s functionality. Furthermore, as
observed in [16], gate relocation sometimes cannot improve
path monotonicity. In the following, we introduce the theoreti-
cal framework to resynthesize a subcircuit given a set of inputs
and a target output by introducing a concept called logical
feasibility.Wethenintroduceanalgorithmforconstructingsub-
circuits using signatures and physical constraints to optimize
the interconnect.
A. Determining Logical Feasibility With Signatures
We introduce a goal-driven synthesis strategy that efﬁciently
ﬁnds a logic implementation for a given physical topology.
The major thrust of previous efforts in postplacement logic
optimization involves the efﬁcient encoding of logic func-
tionality and, in particular, circuit don’t cares. Kravets and
Sakallah [20] proposed a technique to enumerate through the
decompositions of a particular node using BDDs. By encoding
the ways of decomposing a node with BDDs, the authors
provide an algorithm for resynthesizing logic that can work
on mapped netlists using different standard-cell libraries. No
strategy is considered for exploiting global circuit don’t cares
which could be used to enhance the quality of the decom-
positions considered. In [35], sets of pairs of functions to be
distinguished (SPFDs) are introduced as a way of representing
a node’s functionality which can be used to exploit circuit
ﬂexibility in logic optimization. Sinha et al. [31] propose a
technique that uses SPFDs to ﬁnd a logic implementation
given a topological constraint, but their resynthesis approach
does not incorporate physical parameters such as timing and is
limited to only a few neighboring levels of logic to reduce the
memory and computational requirements of SPFDs. In an alter-
native strategy to reduce the memory requirements of SPFDs,
Yang et al. [36] choose a subset of SPFDs for a node using
simulation and compatibility don’t cares in a logic rewriting
application.
In this paper, we use logic signatures to expose circuit
functionality. Our approach is advantageous because the data
structures involved in our technique do not need to represent
an exponential amount of information. This is generally the
case regardless of the underlying functionality. Through logic
simulation, we can encode global circuit don’t cares which are
not limited by levels of logic or required to be compatible.
Furthermore, our approach encodes the distinguishing bits in
a compact data structure with logic signatures so that these
operations can be performed with bitwise parallelism. This
is particularly beneﬁcial in our development of a novel goal-
driven synthesis technique where fast evaluation of topological
constraints while exploiting don’t cares is essential to tightly
couple physical optimization and logic synthesis.
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 3, 2009 at 12:40 from IEEE Xplore.  Restrictions apply.2112 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 12, DECEMBER 2008
Fig. 7. Our ﬂow for restructuring nonmonotonic interconnect. We extract a subcircuit determined by our nonmonotonic metric and ﬁnd topologies that are
logically equivalent using simulation. This new implementation is then veriﬁed by equivalence checking with an incremental SAT solver.
Given an extracted subcircuit with χ inputs, {a1,a 2,...,
aχ}, and output F to resynthesize, we express a candidate
restructuring as a directed graph TF with χ incoming edges,
one outgoing edge F, and n internal vertices. We would like to
determinewhetherthereisalabelingG∗ ofnverticeswithgates
g ∈ G such that F is logically equivalent to the subcircuit TF c
that implements TF with respect to the outputs of the circuit.
We deﬁne the logical feasibility of TF as follows.
Deﬁnition 2: TF is logically feasible iff
∃G∗onset(TF c)=o n s e t ( F)
where onset represents where the subcircuit produces 1 for an
input combination. This deﬁnition can be relaxed by consider-
ing its relation within the care set which could be considerably
smaller than 2χ due to circuit don’t cares.
Deﬁnition 3: TF is logically feasible up to circuit don’t cares
iff ∃G∗onset(TF c) ∪ dc(F)=o n s e t ( F) ∪ dc(F), where dc is
the don’t-care set.
A naive algorithm for determining the logical feasibility
of TF requires that every possible labeling G∗ is tried. For
n vertices, this requires checking |G|n mappings. If the set of
two-inputlogicfunctionsisconsidered,thereare5n mappings.2
Furthermore, performing equivalence checking between TF c
and F is an NP-complete problem. In the following, we discuss
how signatures can be used to determine a set of inputs that
implements a given function and how to quickly determine
logical feasibility up to the signature approximation.
PBDs: A function F is said to be dependent on an input ai
if and only if
Fai=0 ⊕ Fai=1  =0 . (4)
A similar relationship between a signature Sf and input
signaturesS1,...,S χ canbeestablished.In[7],itwasobserved
that a set of input signatures can implement a target signature if
and only if every pair of different bits in Sf is distinguished by
at least one Sχ.
Deﬁnition 4: A pair of bits to be distinguished (PBD) is a
pair {i,j} such that Sf(i)  = Sf(j).
Deﬁnition 5: A candidate signature Sχ distinguishes a PBD
in Sf if Sx(i)  = Sx(j), where {i,j}∈SPBD
f , with SPBD
f
being f’s set of PBDs.
2Although there are 16 different two-input Boolean functions, the tautology
and two one variable transfer functions along with the negated form of each
function do not need to be explicitly considered.
Example 1: Assume a target signal Sf = {0,0,1,1}
and candidates S1 = {0,0,0,1}, S2 = {0,1,0,1}, and S3 =
{0,1,1,1}. The PBDs of Sf are {0,2}, {0,3}, {1,2}, and
{1,3} that need to be distinguished. Note that S1 and S2
together cannot implement Sf because they do not distinguish
{0,2}. However, if all Sχ are used, there exists a function that
gives Sf. In this example, Sf = S3 · (S1¯ ⊕S2). 
Essential PBDs: Input signatures form an irredundant cover
of Sf’s PBDs when the following are satisﬁed: 1) Every PBD
is covered by at least one Si, and 2) removing one Si results
in at least one uncovered PBD. The resulting Si’s form the
support of the function to be resynthesized. We deﬁne a PBD
that is distinguished by only one Si as an essential PBD for Si.
According to the deﬁnition of an irredundant cover and PBDs,
each Si must have at least one essential PBD (or else, that
input can be discarded). Because there is at least one essential
PBD for each input, Sf is dependent on Si, independent of the
speciﬁc implementation, if the following condition holds:
Sf(Si=0) ⊕ Sf(Si=1) =1 . (5)
In the case of the resynthesis of a function F(a1,...,a χ),w e
note that the cardinality of the irredundant cover can be less
than χ because F may be independent of an ai up to don’t cares
and the signature abstraction might not expose enough essential
PBDs. Furthermore, several irredundant covers are possible. In
this paper, we greedily determine irredundant covers by ﬁrst
selecting signatures that cover most PBDs and continuing until
all PBDs are covered.
Determining Logical Feasibility With Essential PBDs: We
now describe how the logical feasibility of a given topology
can be determined using signatures. Later, we will explain how
to create such topologies and how to verify the signature-based
abstraction. Our strategy considers the set of available gates G
as implementing all the two-input logic functions, so that
each node n has exactly two input edges. In general, we do
not restrict our topologies to be fan-out-free trees, where a
topology is fan-out free if each node n in TF has only one out-
going edge.
However, fan-out-free topologies (where we make the addi-
tional constraint that each primary input has only one outgoing
edge)formacriticalaspectofourgoal-drivensynthesisstrategy
because, under a couple of assumptions, they produce circuits
withoptimalareaandtimingifsuchafan-out-freecircuitexists.
First, we assume that the area associated with each node/gate
in the topology is equal (since the implementation of the
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 3, 2009 at 12:40 from IEEE Xplore.  Restrictions apply.PLAZA et al.: OPTIMIZING NONMONOTONIC INTERCONNECT USING SIMULATION AND RESTRUCTURING 2113
topology is unknown). Second, the delay through the subcircuit
is determined by its path length through the topology, where we
assume that each wire corresponding to an edge is optimally
buffered. Therefore, fan-out-free topologies have smaller area
than their nonfan-out-free counterparts when implementing a
single-output function because they have fewer internal nodes
(χ − 1 nodes). Furthermore, fan-out-free topologies have the
same or smaller delay as nonfan-out free trees. The proof of
this is straightforward because if a reconvergent topology has
optimal delay based on path length, converting this topology
to a fan-out-free tree by removing edges and nodes will not
increase path length.
In the next few paragraphs, we introduce an algorithm for
determining logical feasibility on fan-out-free circuits where
each primary input has only one outgoing edge F, which can
be performed with an O(|SPBD
F |∗χ)-time algorithm using
signatures. Because logical feasibility is not always possible
for a fan-out-free tree that optimizes a particular performance
criterion,weextend oursynthesistechniques tohandlearbitrary
nontree topologies.
First, we associate a signature to each input χ of TF. These
signatures implicitly handle CDCs as impossible input combi-
nations which will never occur in the signatures. By simulating
downstream nodes as in [28], observability don’t cares are
derived, and Sf is reduced to include only care values. If we
assume that each Si under simulation distinguishes at least one
essential PBD, we note the following for each two-input gate in
a fan-out-free topology.
Theorem 1: For input signatures S1 and S2 and the two-
input function Φ, the signature S1,2 =Φ ( S1,S 2) has S1 and
S2 essential PBDs.
Proof: Any cut through TF gives a set of inputs that
implements F. Therefore, the SPBD
F must be distinguished by
each cut in Tc
F for a feasible topology. Since, in a fan-out-
free topology, S1 and S2 do not reoccur in the topology, the
output of the node combining S1 and S2, S12, must contain
their essential PBDs to distinguish SF. 
As a direct consequence, each two-input transformation pre-
servesatleasttwoessentialPBDs.Furthermore, PBDsthatonly
occur in both S1 and S2 must also be preserved to uphold
the invariant that every cut through the topology forms an
input support. In a similar manner, the work in [31] upholds
this invariant in constructing a subcircuit but considers SPFDs
instead. We note the following.
Theorem 2: There are at most two two-input Boolean func-
tions (ignoring the negated version of these functions) that can
preserve all the essential PBDs of the input signatures.
Proof: A two-input Boolean function has four-row truth
table with output 0 or 1. One essential PBD adds the following
constraint:
[Φ(a,b)=z] ∧ [Φ(a ,b)=z ] (6)
where a, b, and z are variables with value 0 or 1. In other
words, two different rows of the truth table must have different
values. For the given a and b where an essential PBD is
deﬁned, there are only two such assignments to z that satisfy
this constraint. The remaining two rows in the truth table
can have any of four possible output combinations. Therefore,
there is a total of eight different functions that satisfy this
constraint. We ignore the negated versions of the Boolean
function since that negation can be propagated to the inputs
of later gates. Given this, there are four distinct functions that
can preserve one essential PBD. However, since two essential
PBDs must be preserved, the following constraint needs to be
satisﬁed:
[Φ(a,b)=z] ∧ [Φ(a ,b)=z ] ∧ [Φ(d,e)=y] ∧ [Φ(d,e )=y ].
(7)
If {(a,b),(a ,b)} is disjoint from {(d,e),(d,e )}, there are
only four possible output combinations of z and y that satisfy
the constraints, where two of them are the negated form.
This is also the case if {(a,b),(a ,b)} is not disjoint from
{(d,e),(d,e )} (it is impossible for two different functions
to have essential PBDs on the same two rows). Therefore,
there are at most only two distinct Boolean functions that can
preserve the essential PBDs of its inputs. 
If the fan-out-free tree is traversed in topological order, a
choice between two different two-input gates is available for
each node. In the worst case, all possible combinations must be
tried to preserve all the essential PBDs giving O(|SPBD
F |2χ)-
time complexity (there are χ − 1 nodes). For the typically small
topologies that are considered for resynthesizing portions of
the critical paths, this results in signiﬁcant practical runtime
improvement overtryingallpossiblegatecombinationswithout
considering PBDs. However, we note that, in many cases, the
runtime complexity is linear.
Theorem 3: The logical feasibility of an χ-input fan-out-
free TF can be determined in O(SPBD
F ∗ χ) time when K
simulation vectors completely specify the functionality of F.
Proof: A fan-out-free topology speciﬁes a disjoint parti-
tion of the inputs. If an implementation exists with a disjoint
partitioning of inputs, each internal node corresponds to a
function that is speciﬁed independently of the rest of the imple-
mentation. Therefore, when the signatures completely specify
F (a complete truth table), each internal node is also completely
speciﬁed. Because of this, each two-input operation must pre-
serve at least three essential PBDs (the minimal number of
distinguishing bits that a two-input function can have), and
therefore,onlyonefunctionsatisﬁesthisrelation.Becausethere
is only one such candidate function, the complexity of ﬁnding
an implementation is O(SPBD
F ∗ χ). 
Although we often resynthesize functions with small sup-
ports and, therefore, small truth tables, a logic signature does
not always completely specify a function’s behavior, resulting
in a reduction in the number of bits that need to be distin-
guished. Also, the ability of simulation to quickly identify
circuit don’t cares further reduces the number of bits that
need to be distinguished. By not having a completely speciﬁed
function, we facilitate multiple feasible implementations. De-
spite the advantages of this ﬂexibility in determining a feasible
implementation, an internal two-input operation may only need
to preserve two essential PBDs rather than three, which can
increase the runtime of ﬁnding an implementation. However, in
practice, this runtime penalty is minor because the topologies
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 3, 2009 at 12:40 from IEEE Xplore.  Restrictions apply.2114 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 12, DECEMBER 2008
are typically small. Also, in many cases, logical feasibility can
still be determined in O(SPBD
F ∗ χ) time, depending on which
bits need to be distinguished.
Although, in this paper, we use a functionally complete set
of two-input gates, our approach extends to other standard-cell
libraries. We now explain how to accommodate larger cells.
First, we allow topologies where each node can have more
than two incoming edges. Then, each node with more than two
incoming edges is decomposed into nodes that represent two-
input gates. Finally, this implementation is mapped to a set of
library cells using structural matching.
In some cases, a topology optimizing a certain performance
objective may be logically infeasible. Furthermore, some func-
tions, e.g., z=a b + ac (a multiplexor), cannot be implemented
using a fan-out-free topology. Therefore, a viable technique
must handle a broader family of topologies. In the case of the
multiplexor, notice that only signal a has fan-out, while b and
c only occur once. We now describe how essential PBDs can
be used to guide synthesis for nontree topologies where each
operation preserves at least one of its inputs’ essential PBDs.
This facilitates reconvergence and the implementation of useful
functions including multiplexors, as shown in the following.
Theorem 4: The logical feasibility of an n-node topology TF
can be determined in O(|SF|PBD ∗ 3χ) time for K simulation
vectors under the following conditions.
1) Atleastoneinputtoeachnodedoesnotfan-outtoanother
node at the same or greater logic level.
2) Only implementations are considered where the signa-
tures along each cut through the topology form an irre-
dundant cover.3
The logic level of a node is determined by the path from the
node to the primary inputs with the greatest number of edges.
Proof: By traversing the graph in topological order, note
that at least one essential PBD is transferred to the output.
Also, when those implementations are considered where the
signatures along each cut of the topology form an irredundant
cover, each signature along the cut has at least one essential
PBD. The constraints in (6) suggest that there are four distinct
two-input functions that preserve one essential PBD. However,
one of these functions will correspond to the one input identity
function, i.e., a buffer (or inverter in the negated case). Ignoring
this case, there are three distinct functions that can be tried at
each node, which requires no more than 3n total gate combina-
tions to determine logic feasibility. 
Handling arbitrary topologies with no implementation con-
straints requires more computation where 5n gate combinations
are examined. However, in practice, our approach is faster than
the naive enumeration described at the beginning of the section
because the operations are performed on the signatures, not
over the whole truth table. Also, essential PBDs can still sig-
niﬁcantly prune the search space. Each cut must still cover all
of the PBDs. If an edge from the internal node or primary input
does not appear past a certain logic level in the topology, its
signature’s essential PBDs must be preserved across that level.
3In general, a topology may have an implementation with redundant covers.
However, we focus on implementations that do not use this redundancy to
improve the efﬁciency of our approach.
Fig. 8. Extracting a subcircuit for resynthesis from a nonmonotonic path.
B. Subcircuit Extraction
After identifying the most nonmonotonic path, we extract a
subcircuit, as shown in Fig. 8, where the inputs of the subcircuit
are the incoming edges to the path and the output has outgoing
edges from the end of the path. The inputs and fan-out of the
subcircuit are treated as ﬁxed cells, which form the physical
constraints. As shown in the ﬁgure, if there are outgoing edges
at intermediate nodes in the path, this logic is duplicated. In
practice, we experience minimal cell area increase because few
cells are duplicated, and the resynthesized circuit is sometimes
smaller than the original one.
C. Physically Guided Topology Construction
In addition to efﬁciently determining the logical feasibility
of various topologies, we propose an algorithm that uses PBDs
and physical constraints to efﬁciently construct logically fea-
sible topologies. In this paper, we guide our approach using
delay and physical proximity. In the example shown in Fig. 9,
we try to ﬁnd an optimal restructuring to implement the target
function F with the inputs a, b, and c, using signatures. The
functionality of the original circuit is represented by signatures,
and a table is associated with each signal showing the PBDs
that are distinguished. The nonessential PBDs for each input
signature have light-gray background.
The example shows that the arrival time for c is the greatest,
followed by a, and then b. Therefore, we ﬁrst consider a
topology where c’s value is required later. We also consider
the proximity of the signals and therefore examine a topology
where an operation between a and b is performed. Notice that if
all possible two-input operations are tried, the essential PBDs
are not preserved, and hence, this is not a feasible topology. We
then consider another topology where a can be consumed later
because no topology exists where c is consumed last. For this
topology, we see that an XOR gate will preserve the essential
PBDs. We then easily determine that an OR gate is needed to
implement F.
Algorithm: We introduce the pseudocode of our al-
gorithm for restructuring nonmonotonic interconnect in
Fig. 10. After identifying the nonmonotonic paths, Optimize_
Interconnect restructures a portion of the critical path. We
ﬁrst simplify the signatures by simplify_signatures by
noting that the size of the signature |SF| can be reduced to
the number of different input combinations that occur across
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 3, 2009 at 12:40 from IEEE Xplore.  Restrictions apply.PLAZA et al.: OPTIMIZING NONMONOTONIC INTERCONNECT USING SIMULATION AND RESTRUCTURING 2115
Fig. 9. Signatures and topology constraints guide logic restructuring to improve critical path delay. The ﬁgure shows the signatures for the inputs of the topology
to be derived along with the output. Each table represents the PBDs of the output F that are distinguished. The topology that applies a and b is infeasible because
it does not preserve the essential PBDs of a and b. A feasible topology uses b and c, followed by a.
Fig. 10. Restructuring nonmonotonic interconnect.
{S1,...S i}. Thus, only a subset of the signature is needed for
restructuring because the small subcircuits considered have a
maximum of 2i possible different input combinations, smaller
than the number of simulation vectors applied.4
We then add any timing or physical constraints, such as
locations of the inputs and outputs of the subcircuit being
restructured. In find_opt_topology, we ﬁnd a topology that
satisﬁes all the physical constraints and optimizes delay. The
topology is created by a greedy algorithm which derives a fan-
out-free topology from the current input wires. We examine
eachpairofwires,applyanarbitrarycell,andestimatethedelay
to the output of the subcircuit. The topology is then greedily
constructed so that wire pairs that produce earlier arrival times
are consumed farther from the output of the topology. We
will later discuss how to construct arbitrary nontree topologies.
From this topology, we can get an upper bound for the best im-
plementation possible that contains the examined combination.
If a topology cannot be found that satisﬁes the constraints, the
function returns.
We then check the logical feasibility using PBDs and sig-
natures in check_logical_feasibility. If the topology is
feasible, we associate the appropriate gate with each vertex and
4In our experiments, we apply 2048 input vectors and restructure subcircuits
with less than ten inputs.
place the subcircuit. Our placement routine considers only the
legality of the subcircuit (we will call a placement legalizer
later for the entire design). In our approach, we determine a
location for each gate by placing it at the center of gravity of its
inputs and outputs and then sifting the gate to different nearby
locations. This sifting is done over all the gates over several
passes until a locally optimal solution is achieved, which results
in no overlaps. For the typically small subcircuits considered,
this requires little computational effort.
If the topology is not logically feasible, we add a func-
tional constraint that will prevent the construction of similar
topologies. The constraint states which wire pairs should not
be combined again. For instance, for the multiplexor, z =
a b + ac, there is no implementation for a fan-out-free topology
with inputs {a,b,c}.I fa and b form a wire pair, we see that
no implementation preserves its essential PBDs. However, we
can exploit Theorem 4 and consider implementations that can
eliminate one of the inputs. In this case, if the implementation
a b is tried, the wire b does not need to reappear in the topology.
Therefore, a constraint is added so that the inputs to the topol-
ogy are now {a b,a,c}. With these inputs, a fan-out-free tree
does exist which is logically feasible.
If Optimize_Interconnect returns a subcircuit, we check
the equivalence of the entire circuit using a SAT engine. In
the case where our candidate produces a functionally different
circuit (which is rare, as shown in Section VII), we use the
counterexample generated by SAT to reﬁne our simulation,
hence improving the signatures’ quality. If the resulting sub-
circuit passes veriﬁcation, we update the netlist and legalize the
placement. We update the timing information and the NMFs if
a new critical path is found, in which case we select with the
next highest NMF and restructure it.
D. Efﬁcient Subcircuit Veriﬁcation
Because we use signatures to limit veriﬁcation of opti-
mization candidates that are most likely correct, equivalence
checking typically conﬁrms the transformation. As in [9], we
reﬁne simulation using counterexamples found by failed equiv-
alence checks so as to reduce additional failed checks. We also
minimize the veriﬁcation time due to equivalence checking by
considering only the portions of logic that contribute to the
don’t cares used in the transformation. As explained in [28],
several don’t cares can exist within a few levels of logic. We
invoke a SAT engine so that it considers only these necessary
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 3, 2009 at 12:40 from IEEE Xplore.  Restrictions apply.2116 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 12, DECEMBER 2008
levels of downstream logic. Additionally, we could restrict the
equivalence checking to a window around the optimization
location to further reduce veriﬁcation time while still utilizing
CDCs and ODCs in the circuit.
However, in practice, we observe that the SAT-based equiv-
alence checking requires a small percentage of runtime com-
pared to constructing optimal topologies even for our larger
circuit examples. This small runtime can be attributed to the
locality of most of our structural transformations. Because the
structures of the original and modiﬁed circuits are similar,
the SAT instance can be greatly reduced in size and complexity.
This limits the complexity of our approach, which tends not to
grow with the size of the overall circuit.
VI. ENHANCING RESYNTHESIS THROUGH GLOBAL
SIGNATURE MATCHING
Our resynthesis strategy considers the inputs to a non-
monotonic path for resynthesis. This strategy is convenient
because of the following reasons: 1) The set of inputs can
always implement the target output, and 2) the inputs tend to
be physically close to the target output. However, local manip-
ulations can be enhanced by incorporating global information,
as in logic rewriting which uses structural hashing [22]. In
this section, we explain how to exploit the same advantages as
structural hashing by performing matching up to the signature
abstraction. Furthermore, our approach is more powerful than
logic rewriting because the signatures are matched up to global
don’t cares, and our initial physically guided local rewriting
over signatures already exploits don’t cares. We observe that
our enhancement is no worse than the algorithm from [27] but
appears more robust and predictable.
Algorithm: In the following, we outline how signature
matching is used in the resynthesis of nonmonotonic paths.
1) Find a set of candidate wires within a certain distance of
the output wire to be resynthesized.
2) Check whether any of these wires’ signatures is equal to
the output signature up to don’t cares. If a match is found
and the timing improves, replace the output wire with the
corresponding candidate wire.
3) While checking logic feasibility in topological order,
check whether any of the internal wires of the topology
can be reimplemented by a candidate wire with a match-
ing signature so that the timing is improved.
The candidate wires are chosen by proximity to the output
wire being resynthesized as determined by its half-perimeter
wirelength (HPWL). Any wire that has arrival time after the
current output wire’s arrival time is not considered. Unlike
the resynthesis algorithm that uses a simpliﬁed signature, for
signature matching, we consider the whole signature except
for the don’t cares. In this case, a single comparison between
signatures can be performed quickly and is more efﬁcient
than ﬁnding a common set of inputs to both wires and then
reducing the signatures to the number of simulated different
input combinations. Notice that our algorithm enhances the
previous resynthesis strategy and improves the timing of an
implementation, whereas topology construction only considers
the inputs to the subcircuit.
Fig. 11. Above graph shows the percentage of paths whose NMF is below a
g i v e nv a l u eo nt h ex-axis. Notice that longer paths tend to be nonmonotonic
and that at least 1% of paths are more than ﬁve times the ideal minimal length.
VII. EXPERIMENTS
We implemented and tested our algorithms with circuits from
the IWLS 2005 benchmark suite [39], with design utilization
being set to 70% to match recent practices in the industry.
Our wire and gate characterizations are based on a 0.18-μm
technology library. We perform static timing analysis using
the D2M delay metric [2] on rectilinear Steiner minimal trees
produced by FLUTE [12]; here, FLUTE can be easily replaced
by a timing-driven subroutine, but we do not expect the over-
all trends in our experiments to change signiﬁcantly. Our
netlist transformations are veriﬁed using a modiﬁed version of
MiniSAT [13] and placed using Capo 10 [6]. We have consid-
ered several different initial placements for each circuit by
varying a random seed in Capo and report results as averages
over these placements. Our netlist transformations are legalized
using the legalizer provided by the GSRC Bookshelf [41].
Our delay improvements are achieved by executing the algo-
rithm in Fig. 10. We applied 2048 random simulation patterns
initially to generate the signatures. We considered paths of less
than or equal to four hops (ﬁve nodes) using our delay-based
metric which allowed us to ﬁnd many nonmonotonic paths
while minimizing the size of the transformations considered.
We conducted several optimization passes until no more gains
were achieved.
A. Prevalence of Nonmonotonic Interconnect
Our experiments indicate that circuits often contain many
nonmonotonic paths. In Fig. 11, we show a cumulative distri-
bution of the percentage of paths whose NMFs are below the
corresponding value on the x-axis. We generated these averages
over all the circuits in Table I. Each line represents a different
path length examined, where we considered paths from two
to six hops using the wirelength-based NMF metric. We also
show the cumulative distribution for the four-hop delay-based
NMF calculation used to guide our delay-based restructuring.
Of particular interest is the percentage of monotonic paths, i.e.,
paths with NMF =1 .
Notice that smaller paths of two hops are mostly monotonic,
whereas the percentage of monotonic paths decreases to 23%
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 3, 2009 at 12:40 from IEEE Xplore.  Restrictions apply.PLAZA et al.: OPTIMIZING NONMONOTONIC INTERCONNECT USING SIMULATION AND RESTRUCTURING 2117
TABLE I
SIGNIFICANT DELAY IMPROVEMENT IS TYPICALLY ACCOMPANIED BY A SMALL WIRELENGTH INCREASE
when paths with more logic levels are considered (six-hop
paths). This indicates that focusing optimizations on small
paths only, as in [4], can miss several optimization opportu-
nities. It is also interesting to note that there are paths with
considerably worse monotonicity having NMFs that are greater
than ﬁve, indicating regions where interconnect optimizations
are needed. We observe similar trends using our delay-based
metric. The inclusion of gate delay on these paths results
in greater nonmonotonicity when compared to the wirelength
metric. Although not shown, each individual circuit exhibits
similar trends.
B. Physically Aware Restructuring
We show the effectiveness of our delay-based optimization
by reporting the delay improvements achieved over several
circuits. In Table I, we provide the number of cells and nets
for each benchmark. In the Performance columns, we give
the percentage delay improvement, the runtime in seconds, and
the percentage of equivalence checking calls where candidate
subcircuits preserved the functionality of the whole circuit. We
also report the overhead of our techniques in terms of increased
wirelength and area (cell count).
Considering eight independently generated initial place-
ments for each circuit, our techniques improve delay by 11.7%
on average. For some circuits, such as s35932, several don’t-
care enhanced optimizations enabled even greater delay im-
provements. We observe the following.
1) By optimizing only one output of a given subcircuit,
we greatly reduce the arrival time of that output while
only slightly degrading the performance of less critical
outputs.
2) Through our efﬁcient use of don’t cares, several χ-input
subcircuits could be restructured to require fewer than
χ inputs.
3) As a special case of the previous point, sometimes, an
input to the subcircuit was functionally equivalent to the
output of the subcircuit when don’t cares were consid-
ered, enabling delay reduction along with removal of
unnecessary logic. Signatures are efﬁcient in exploiting
these opportunities.
4) The decomposition of large gates into smaller gate prim-
itives through our restructuring algorithm often produces
better topologies because we more precisely construct a
topology to meet the physical constraints.
5) We also expect gains due to the duplication and relocation
of some cells.
We believe that further gains would be enabled by combining
buffering, relocation, and gate sizing strategies between our re-
structuring optimizations. The runtime of our algorithm scales
well for large circuits due to the use of logic simulation as the
main optimization engine. Furthermore, the high percentage
of equivalence checking calls that veriﬁed the equivalence of
our transformations (as shown by column %equi in Table I)
indicates that signatures are effective at ﬁnding functionally
equivalent candidates. The wirelength and cell-count overhead
are minimal because only a few restructurings are needed and
the optimizations can simplify portions of logic. In some cases,
the number of cells is reduced.
To check if our techniques provide comparable improve-
ment when the initial placement is optimized for timing, we
performed the following experiment. We ﬁrst produced 64
independent initial placements optimized for total wirelength.
Compared with these 64 wirelength-optimized placements, the
best placements achieve 17.0% smaller delay on average and
serve as proxies for timing-optimized placements in our experi-
ments.Startingwiththebest placements, ourlogicrestructuring
further decreased delay by 6.5%.
In Fig. 12, we show that our delay-based NMF metric is
effective at guiding optimization. Each data point represents a
different resynthesis try considering all of the circuits in Table I.
The x-axis shows the predicted percentage delay gain possible
(determined by the optimal buffered delay). The y-axis indi-
cates the actual gain. Data points that lie on the x-axis indicate
resynthesis tries that did not improve delay (a better topology
could not be found). The 50% threshold line divides the graph
so that the numbers of resynthesis attempts are equal on both
sides. The diagonal line indicates an upper bound prediction
for delay gain. Because some of the optimizations reduce the
support of the original subcircuit, we can improve the delay
beyond the estimate which considers all of the subcircuit’s
inputs. Therefore, some of the data points are above the upper
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 3, 2009 at 12:40 from IEEE Xplore.  Restrictions apply.2118 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 12, DECEMBER 2008
Fig. 12. Graph above illustrates that the largest actual delay improvements
occur at portions of the critical path with the largest estimated gain using
our metric. The data points are accumulated gains achieved by 400 different
resynthesis attempts when optimizing the circuits in Table I.
bound line. On the other hand, a resynthesis try produces a
smaller-than-estimated improvement when the ideal topology is
not logically feasible or when removing cell overlap degrades
the quality of the initial placement. Although the NMF and gain
calculations do not directly incorporate circuit functionality,
74% of all delay gains are found on the right half of the graph.
The correlation to our metric could be further improved by
incorporating the percentage of gain possible with respect to
near-critical paths.
C. Comparison With RAR
We compare our technique with timing optimizations using
RAR. We implement redundancy removal using signatures [28]
to identify equivalent nodes up to don’t cares. In the context of
path-based resynthesis, the inputs to the subcircuit, along with
signals that have earlier arrival time and are within a bounding
box determined by the HPWL of the output, are considered as
candidates for rewiring. If one of these signals is equivalent to
the output up to don’t cares in the circuit, rewiring is performed,
and the timing is improved.
In Table II, we show the delay improvement of our resyn-
thesis strategy which uses global signature matching to RAR.
For this experiment, we report results on a random slice of
initial placements from our suite. First, note that our technique
is almost twice as effective at improving delay. Furthermore,
our results are more consistent than RAR over all the circuits
and are never worse.
VIII. CONCLUSION
Interconnect delay is becoming a major obstacle for achiev-
ing timing closure, typically requiring numerous expensive de-
sign iterations. Current logic synthesis strategies often sacriﬁce
other performance metrics to improve delay, requiring com-
putationally expensive algorithms and companion placements.
Despite these efforts, extensive postplacement optimizations
are still needed, particularly since buffers will represent a large
fraction of standard cells in future technologies [29].
TABLE II
EFFECTIVENESS OF OUR APPROACH COMPARED TO RAR
We propose a solution that improves the quality of delay
optimization without sacriﬁcing other performance metrics. To
this end, we introduce a novel simulation-guided synthesis
strategy that is more comprehensive than current restructuring
techniques. We develop a path-monotonicity metric to focus
our efforts on the most important parts of a design. Our
optimizations lead to 11.7% delay improvement on average
over several different initial placements, while our delay-based
monotonicity metric indicated that 65% of the paths analyzed
were nonmonotonic. We further observe delay improvements
on placements initially optimized for delay, which are con-
sistent with our reported average improvement. We believe
that our approach offers an effective bridge between current
topological-based synthesis and lower level physical synthesis
approaches. It enables less conservative estimates early in the
design ﬂow to improve other performance metrics and reduce
the amount of buffering required by shortening critical paths.
Future work will explore the beneﬁts of using our technique
with other physical synthesis strategies such as buffering.
REFERENCES
[1] A. AjamiandM.Pedram,“Post-layout timing-drivencell placementusing
an accurate net length model with movable Steiner points,” in Proc. DAC,
2001, pp. 595–600.
[2] C.Alpert,A.Devgan,andC.Kashyap,“RCdelaymetricsforperformance
optimization,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.,
vol. 20, no. 5, pp. 571–582, May 2001.
[3] C. Alpert, A. Kahng, C. Sze, and Q. Wang, “Timing-driven Steiner trees
are (practically) free,” in Proc. DAC, 2006, pp. 389–392.
[4] G. Beraudo and J. Lillis, “Timing optimization of FPGA placements by
logic replication,” in Proc. DAC, 2003, pp. 196–201.
[5] D. Brand, “Veriﬁcation of large synthesized designs,” in Proc. ICCAD,
1993, pp. 534–537.
[6] A. Caldwell, A. Kahng, and I. Markov, “Can recursive bisection alone
produce routable placements?” in Proc. DAC, 2000, pp. 693–698.
[7] K.-H. Chang, I. Markov, and V. Bertacco, “Fixing design errors with
counterexamplesandresynthesis,”inProc.ASP-DAC,2007,pp.944–949.
[8] K.-H. Chang, I. Markov, and V. Bertacco, “Safe delay optimization for
physical synthesis,” in Proc. ASP-DAC, 2007, pp. 628–633.
[9] K.-H. Chang, D. Papa, I. Markov, and V. Bertacco, “InVerS: An in-
cremental veriﬁcation system with circuit similarity metrics and error
visualization,” in Proc. ISQED, 2007, pp. 487–494.
[10] S. Chatterjee and R. Brayton, “A new incremental placement algorithm
and its application to congestion-aware divisor extraction,” in Proc.
ICCAD, 2004, pp. 541–548.
[11] C.-W. Chang, C.-K. Cheng, P. Suaris, and M. Marek-Sadowska, “Fast
post-placement rewiring using easily detectable functional symmetries,”
in Proc. DAC, 2000, pp. 286–289.
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 3, 2009 at 12:40 from IEEE Xplore.  Restrictions apply.PLAZA et al.: OPTIMIZING NONMONOTONIC INTERCONNECT USING SIMULATION AND RESTRUCTURING 2119
[12] C. Chu and Y.-C. Wong, “Fast and accurate rectilinear Steiner minimal
treealgorithmforVLSIdesign,”inProc.ISPD,2005,pp.28–35.[Online].
Available: http://class.ee.iastate.edu/cnchu/ﬂute.html
[13] N. Een and N. Sorensson, “An extensible SAT-solver,” in Proc. SAT,
2003, pp. 502–518. [Online]. Available: http://www.cs.chalmers.se/Cs/
Research/FormalMethods/MiniSat/
[14] W.Gosti,A.Narayan,R.Brayton,andA.Sangiovanni-Vincentelli,“Wire-
planning in logic synthesis,” in Proc. ICCAD, 1998, pp. 26–33.
[15] W. Gosti, S. Khatri, and A. Sangiovanni-Vincentelli, “Addressing the
timing closure problem by integrating logic optimization and placement,”
in Proc. ICCAD, 2001, pp. 224–231.
[16] M. Hrkic, J. Lillis, and G. Beraudo, “An approach to placement-coupled
logic replication,” in Proc. DAC, 2004, pp. 711–716.
[17] Y.-M. Jiang, A. Krstic, K.-T. Cheng, and M. Marek-Sadowska, “Post-
layout logic restructuring for performance optimization,” in Proc. DAC,
1997, pp. 662–665.
[18] L. Kannan, P. Suaris, and H. Fang, “A methodology and algorithms for
post-placement delay optimization,” in Proc. DAC, 1994, pp. 327–332.
[19] A. Kuehlmann, V. Paruthi, F. Krohm, and M. Ganai, “Robust Boolean
reasoning for equivalence checking and functional property veriﬁcation,”
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 12,
pp. 1377–1394, Dec. 2002.
[20] V. N. Kravets and K. A. Sakallah, “Resynthesis of multi-level circuits
under tight constraints using symbolic optimization,” in Proc. ICCAD,
2002, pp. 687–693.
[21] C. Li, C.-K. Koh, and P. Madden, “Floorplan management: Incremental
placement for gate sizing and buffer insertion,” in Proc. ASP-DAC, 2005,
pp. 349–354.
[22] A. Mishchenko, S. Chatterjee, and R. Brayton, “DAG-aware AIG rewrit-
ing: A fresh look at combinational logic synthesis,” in Proc. DAC, 2006,
pp. 532–536.
[23] A. Mishchenko, S. Chatterjee, R. Jiang, and R. Brayton, “FRAIGs:
A unifying representation for logic synthesis and veriﬁcation,”
EECS Dept., UC Berkeley, Berkeley, CA, 2005. ERL Tech. Rep. [Online].
Available: http://www.eecs.berkeley.edu/~alanmi/publications/
[24] M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik,
“Chaff: Engineering an efﬁcient SAT solver,” in Proc. DAC, 2001,
pp. 530–535.
[25] R. H. J. M. Otten and R. K. Brayton, “Planning for performance,” in Proc.
DAC, 1998, pp. 122–127.
[26] M. Pedram and N. Bhat, “Layout driven logic restructuring/
decomposition,” in Proc. ICCAD, 1991, pp. 134–137.
[27] S. Plaza, I. Markov, and V. Bertacco, “Optimizing non-monotonic in-
terconnect using functional simulation and logic restructuring,” in Proc.
ISPD, 2008, pp. 95–102.
[28] S. Plaza, K.-H. Chang, I. Markov, and V. Bertacco, “Node mergers in the
presence of don’t cares,” in Proc. ASP-DAC, 2006, pp. 414–419.
[29] P. Saxena, N. Menezes, P. Cocchini, and D. Kirkpatrick, “Repeater scaling
and its impact on CAD,” IEEE Trans. Comput.-Aided Design Integr.
Circuits Syst., vol. 23, no. 4, pp. 451–463, Apr. 2004.
[30] J. Marques-Silva and K. Sakallah, “GRASP: A search algorithm for
propositional satisﬁability,” IEEE Trans. Comput., vol. 48, no. 5, pp. 506–
521, May 1999.
[31] S. Sinha, A. Mishchenko, and R. Brayton, “Topologically constrained
logic synthesis,” in Proc. ICCAD, 2002, pp. 679–686.
[32] G. Stenz, B. Riess, B. Rohﬂeisch, and F. Johannes, “Timing driven place-
ment in interaction with netlist transformations,” in Proc. ISPD, 1997,
pp. 36–41.
[33] L. P. P. P van Ginneken, “Buffer placement in distributed RC-tree net-
works for minimal Elmore delay,” in Proc. ISCAS, 1990, pp. 865–868.
[34] J. Werber, D. Rautenbach, and C. Szegedy, “Timing optimization
by restructuring long combinatorial paths,” in Proc. ICCAD, 2007,
pp. 536–543.
[35] S. Yamashita, H. Sawada, and A. Nagoya, “SPFD: A new method to
express functional ﬂexibility,” IEEE Trans. Comput.-Aided Design Integr.
Circuits Syst., vol. 19, no. 8, pp. 840–849, Aug. 2000.
[36] Y.-S. Yang, S. Sinha, A. Veneris, and R. Brayton, “Automating
logic rectiﬁcation by approximate SPFDs,” in Proc. ASP-DAC, 2007,
pp. 402–407.
[37] Q. Zhu, N. Kitchen, A. Kuehlmann, and A. Sangiovanni-Vincentelli,
“SAT sweeping with local observability don’t-cares,” in Proc. DAC, 2006,
pp. 229–234.
[38] The International Technology Roadmap for Semiconductors. 2005
Edition, ITRS.
[39] [Online]. Available: http://iwls.org/iwls2005/benchmarks.html
[40] SynopsysDesignCompiler.[Online].Available:http://www.synopsys.com
[41] UMICH Physical Design Tools. [Online]. Available: http://vlsicad.eecs.
umich.edu/BK/PDtools/
Stephen M. Plaza received the M.S. degree in 2003
and his Ph.D. degree in 2008 from the University of
Michigan, Department of Electrical Engineering and
Computer Science.
He is currently a Senior Research and Develop-
ment Engineer at the Advanced Technology Group
of Synopsys. His areas of research interest include
logic synthesis, veriﬁcation, fault-tolerant design,
and SAT solving.
Igor L. Markov (S’97–M’01–SM’05) received the
M.A. degree in mathematics and the Ph.D. degree in
computer science from the University of California,
Los Angeles.
He is currently an Associate Professor with the
Department of Electrical Engineering and Computer
Science, University of Michigan, Ann Arbor. His
interests include computers that make computers
(software and hardware), secure hardware design,
combinatorial optimization with applications to the
design and veriﬁcation of integrated circuits, as well
as quantum logic circuits.
Prof. Markov is a member of the Editoral Board of the Communications of
the ACM,t h eACM Transactions on Design Automation of Electronic Systems,
the IEEE TRANSACTIONS ON COMPUTERS,a n dt h eI E E ET RANSACTIONS
ON COMPUTER-AIDED DESIGN. He has coauthored more than 150 refereed
publications, some of which were honored by the best paper awards at the
Design Automation and Test in Europe Conference (DATE), the Interna-
tional Symposium on Physical Design (ISPD) and the IEEE CAS Donald O.
Pederson Best Paper Award in the IEEE TRANSACTIONS ON COMPUTER-
AIDED DESIGN. Additionally, he is the recipient of a DAC Fellowship, an
ACM SIGDA Outstanding New Faculty Award, an ACM SIGDA Technical
Leadership Award, an NSF CAREER Award, an IBM Partnership Award, a
Synplicity, Inc. Faculty Award, and a Microsoft A. Richard Newton Break-
through Research Award. He served on a number of program committees and
chaired some of them. He graduated six Ph.D. students and is now working with
seven graduate students. His students won programming contests, fellowships,
and other awards at DAC 2001, ICCAD 2002, DAC 2004, ICCAD 2004, DATE
2005, IWLS 2005, ICCAD 2005, ISPD 2007, DATE 2008, and ISPD 2008.
He is currently a Senior Member of ACM and a member of the Executive
Board of ACM SIGDA. He was a recipient of the University of Michigan EECS
Department Outstanding Achievement Award.
Valeria M. Bertacco (S’95–M’03) received the Lau-
rea degree (summa cum laude) in computer engineer-
ing from the University of Padova, Padova, Italy, and
the M.S. and Ph.D. degrees in electrical engineering
from Stanford University, Stanford, CA, in 1998 and
2003, respectively.
She is an Assistant Professor of electrical en-
gineering and computer science (EECS) with the
University of Michigan, Ann Arbor. Her research
interests are in the areas of formal and semiformal
design veriﬁcation with emphasis on full design val-
idation and digital system reliability. She joined the faculty at the University
of Michigan after being a Staff Research Engineer for four years with the
Advanced Technology Group, Synopsys. Prior to Synopsys, she was with
Systems Science, Inc., a Palo Alto startup that developed Vera, a testbench
development language for veriﬁcation, later acquired by Synopsys.
Prof. Valeria is an Associate Editor of the IEEE TRANSACTIONS ON
COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS.S h e
has served on several program committees, including the Design Automation
Conference and the International Conference on Computer-Aided Design, and
as the Chair of the Veriﬁcation Committee of the Design Automation and Test
in Europe Conference. She has been leading the effort for the development of
the veriﬁcation chapter in the International Technology Roadmap for Semi-
conductors report since 2004. She was a recipient of the National Science
FoundationCAREERAwardandtheUniversityofMichiganEECSDepartment
Outstanding Achievement Award.
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 3, 2009 at 12:40 from IEEE Xplore.  Restrictions apply.