Estimating circuit delays in FPGAs after technology mapping by Severens, Berg et al.
Estimating Circuit Delays in FPGAs after
Technology Mapping
Berg Severens, Elias Vansteenkiste, Karel Heyse, Dirk Stroobandt
Hardware and Embedded Systems Team, Computer Systems Lab
Department of Electronics and Information Systems
Ghent University, Belgium
{Berg.Severens, Elias.Vansteenkiste, Karel.Heyse, Dirk.Stroobandt}@ugent.be
Abstract—An FPGA implementation requires a significant
effort of the hardware designer, who optimizes FPGA designs
by going through many time-consuming CAD flow iterations.
These iterations provide two types of feedback: (1) the FPGA
performance and (2) the identification of the parts having the
highest impact on the FPGA performance. Both depend on the
wirelength behavior.
Studies have been dedicated to the estimation of local [5] and
global [4] wirelengths, but to our knowledge both performance
estimations and identification of the critical zone are not present
in literature. Therefore this paper, firstly, presents a comparison
of three performance estimation techniques: logic depth, Monte
Carlo simulation and fast placement (ordered from low to high
accuracy and runtime). Secondly, four methods identifying the
critical zone are compared. Results show that Monte Carlo
simulations provide a good identification of the parts having the
highest impact on the performance. We conclude that Monte
Carlo simulations provide useful feedback within a short runtime
(about 30 times faster than placement), reducing the time-to-
market of FPGA implementations.
I. INTRODUCTION
FPGAs are digital chips that can be configured by the cus-
tomer. Thanks to mass production and configurability, FPGAs
are ideally suited for medium-volume electronic applications.
The FPGA CAD tool flow is used during the design process,
typically comprising five steps [8], see Figure 1. First, HDL
code is converted to an and-inverter graph during logic syn-
thesis. Then the graph is mapped onto a network of Lookup
Tables (LUTs). Subsequently the packer groups the LUTs into
clusters, which are assigned to the FPGA grid in the placement
step. Finally routing determines the signal paths between these
blocks. This paper presents early performance feedback after
technology mapping. As such, the hardware designer is able
to finetune the HDL code without executing the slow packing,
placement and routing steps.
After technology mapping, early performance feedback is
given. As such, the hardware designer is able to finetune the
HDL code without executing the slow packing, placement and
routing steps.
In this paper we explore the possibility of a shorter
feedback loop via accurate timing estimates after technology
mapping in order to avoid the time-consuming full iterations
(12 hours or more is not uncommon [7]). This optimization
cycle can be shortened significantly: logic synthesis and tech-
nology mapping are responsible for 23% of the total CAD












Fig. 1. The conventional way to optimize the FPGA’s performance is by
iterating over the whole CAD tool flow. This papers presents a shorter feedback
loop by providing information after technology mapping.
VTR [8] benchmarks through Vivado [1]. Furthermore, this
tool provides performance estimates after mapping, yielding
a correlation of 0.989 (estimation errors of tens of percents
are not uncommon) with the post-routing delays for the same
benchmarks. The logic depth, which is a very simple metric,
already provides a correlation of 0.979 (in the remaining
of this paper, a collection of MCNC and VTR benchmarks
will be used). The modest Vivado accuracy indicates that
significant improvements to the state-of-the-art estimations
might be possible and necessary for a reliable design process.
To our knowledge this is the first paper focussing on esti-
mating total circuit delay after technology mapping. However,
similar literature can be found: extensive research considered
local wirelength estimations [5] and total wirelength estima-
tions [4]. In [3] fast placement was used as an early timing
feedback model for improved technology mapping. A possible
reason for the poor exploration of our topic is [9]: this paper
discourages interconnect prediction stating that the critical path
is determined by the exceptional long wires, which are difficult
to estimate. However, Monte Carlo simulations include these
exceptions in a reasonably accurate way (see Section III).
The outline of this paper is as follows: in the next sec-
tion we will discuss the influence of the non-deterministic
placement on the post-mapping estimation accuracy. Section
III contains two different ways to estimate performance after
technology mapping: fast placement and Monte Carlo simu-
lations. Subsequently Section IV contains a discussion about


the critical neighbouring connections. (3) L was on the critical
path, but when its delay is zero, the critical path changes. In
this case td is a strictly positive value, but lower than the
previous case.
B. Comparison Delay Impact Estimation
This section contains a comparison of the four proposed
delay impact estimation techniques. A valid experimental
setup to compare different techniques on estimating the delay
impact of a node should include the structural variation of the
placement (see Section II). Therefore consider the following
experimental setup, based on our definition of delay impact
per collection of nodes:
• We run the MCNC benchmark suite
with the homogeneous VPR architecture
k6 frac N10 mem32K 40nm for 10 different
placements. The average circuit delays can be found
in the column ’Original’ in Table 1.
• For each of the four presented techniques, we do the
following: the collection of 10 output nodes (or inputs
of latches) with highest estimated impacts are cut out
per benchmark, as if they were perfectly optimized
by the designer. Removed nodes are called “left-out-
nodes” in the following. The new mapped circuits
are packed, placed and routed all over again for 10
different placements. Table 1 depicts the average delay
results per technique.
The results of Table 1 show that the removal of nodes
was on average about the same for normal placement and for
fast placement. Monte Carlo simulations did slightly better for
both the criticality and the EAD. However, this experimental
setup has limitations. Probably the most important source of
uncertainty is the following: the removal of an output node also
removes the inputs of this node. The number of additionally
removed nodes can affect the delays of the remaining output
nodes. On the other hand this effect should not be overesti-
mated as the correlation between the number of LUTs and the
circuit delay is very low (0.37), indicating that the number
of nodes does not affect the delay drastically. Due to the
remaining uncertainty we do not claim that Monte Carlo is able
to predict the impact more accurately than post-routing slacks.
But we can state that Monte Carlo might be able to yield
a reasonably accurate impact estimate, which is a surprising
result for a low-runtime technique.
V. CONCLUSION
This paper explored the trade-off between accuracy and
runtime for performance and performance impact feedback
information. The accuracy of the estimations is affected by the
non-deterministic behavior of the placement algorithm. Firstly,
three feedback performance estimations were compared: logic
depth, Monte Carlo simulations and fast placement. All three
estimations are Pareto-optimal considering both accuracy and
runtime, where Monte Carlo simulations provide a good
trade-off between the two considered dimensions. Secondly,
four impact estimations were compared, indicating that the
accuracy of the Monte Carlo techniques is similar to the
accuracy of normal post-routing information. We conclude
TABLE I. CIRCUIT DELAYS (NS) WITH 10 LEFT-OUT-NODES FOR
DIFFERENT NODE REMOVAL DECISION TECHNIQUES. THE EAD, A
LOW-RUNTIME PERFORMANCE ESTIMATION BASED ON MONTE CARLO
SIMULATIONS, PROVIDES A BETTER IMPACT ESTIMATION THAN
POST-ROUTING SLACKS.
Original Normal Fast Monte Carlo Monte Carlo
Placement Placement Criticality EAD
[ns] [ns] [%] [ns] [%] [ns] [%] [ns] [%]
apex4 5.56 5.01 -9.9 4.58 -17.6 4.97 -10.6 4.97 -10.6
bigkey 2.56 2.01 -21.5 2.00 -21.9 2.09 -18.4 2.18 -14.8
clma 8.6 6.57 -23.6 6.83 -20.6 3.69 -57.1 3.69 -57.1
des 3.77 4.11 9.0 3.86 2.4 4.01 6.4 4.00 6.1
diffeq 4.87 5.17 6.2 4.99 2.5 4.85 -0.4 4.84 -0.6
dsip 2.63 2.59 -1.5 2.57 -2.3 2.62 -0.4 2.55 -3.0
elliptic 7.08 7.26 2.5 7.07 -0.1 7.55 6.6 7.19 1.6
ex5p 4.84 4.96 2.5 5.08 5.0 5.06 4.5 4.81 -0.6
frisc 9.17 8.94 -2.5 9.17 0.0 8.76 -4.5 9.19 0.2
misex3 4.74 4.03 -15 4.08 -13.9 4.43 -6.5 4.43 -6.5
pdc 6.91 6.99 1.2 6.96 0.7 6.92 0.1 6.71 -2.9
s38417 5.09 5.83 14.5 5.56 9.2 5.73 12.6 5.35 5.1
s38584.1 4.74 4.53 -4.4 4.53 -4.4 4.92 3.8 4.94 4.2
seq 4.83 4.22 -12.6 4.75 -1.7 4.48 -7.2 4.53 -6.2
spla 6.83 5.6 -18 6.04 -11.6 6.43 -5.9 6.54 -4.2
Geomean 5.16 4.87 4.88 4.79 4.76
Gain -5.6% -5.4% -7.0% -7.7%
that Monte Carlo simulations are reasonably accurate for both
performance and performance impact estimations within a low
runtime. The combination of these properties in a tool can
help the hardware designer to reduce the time-to-market of an
FPGA implementation.
REFERENCES
[1] Vivado design suite, 2014.
[2] M. Gort and J. Anderson. Analytical placement for heterogeneous
FPGAs. In Field Programmable Logic and Applications (FPL), 2012
22nd International Conference on, pages 143–150, Aug 2012.
[3] J. Y. Lin, A. Jagannathan, and J. Cong. Placement-driven technology
mapping for LUT-based FPGAs. In In Proceedings of the ACM Int.
Syposium on FPGAs, pages 121–126, 2003.
[4] Q. Liu, J. Ma, and Q. Zhang. Neural network based pre-placement
wirelength estimation. In FPT, pages 16–22. IEEE, 2012.
[5] V. Manohararajah, G. R. Chiu, P. Singh, and S. D. Brown. Predicting
interconnect delay for physical synthesis in a FPGA CAD flow.
[6] G. Mummolo. Measuring uncertainty and criticality in network planning
by pert-path technique. International Journal of Project Management,
15(6):377 – 387, 1997.
[7] K. Murray, S. Whitty, S. Liu, J. Luu, and V. Betz. Titan: Enabling large
and complex benchmarks in academic CAD. In Field Programmable
Logic and Applications (FPL), 2013 23rd International Conference on,
pages 1–8, Sept 2013.
[8] J. Rose, J. Luu, C. W. Yu, O. Densmore, J. Goeders, A. Somerville, K. B.
Kent, P. Jamieson, and J. Anderson. The VTR project: Architecture
and CAD for FPGAs from verilog to routing. In Proceedings of the
ACM/SIGDA International Symposium on Field Programmable Gate
Arrays, FPGA ’12, pages 77–86, New York, NY, USA, 2012. ACM.
[9] L. Scheffer and E. Nequist. Why interconnect prediction doesn’t work.
In Proceedings of the 2000 International Workshop on System-level
Interconnect Prediction, SLIP ’00, pages 139–144, New York, NY, USA,
2000. ACM.
