A Novel Approach to Minimizing Reconfiguration Cost for LUT-Based FPGAs by Raghuraman, Krishna Prasad et al.
Southern Illinois University Carbondale
OpenSIUC
Conference Proceedings Department of Electrical and ComputerEngineering
1-2005
A Novel Approach to Minimizing Reconfiguration
Cost for LUT-Based FPGAs
Krishna Prasad Raghuraman
Southern Illinois University Carbondale
Haibo Wang
Southern Illinois University Carbondale, haibo@engr.siu.edu
Spyros Tragoudas
Southern Illinois University Carbondale
Follow this and additional works at: http://opensiuc.lib.siu.edu/ece_confs
Published in Raghuraman, K.P., Wang, H., & Tragoudas, S. (2005). A novel approach to minimizing
reconfiguration cost for LUT-based FPGAs. Proceedings of the 18th International Conference on
VLSI Design held jointly with 4th International Conference on Embedded Systems Design
(VLSID’05), 673-676. doi: 10.1109/ICVD.2005.25 ©2005 IEEE. Personal use of this material is
permitted. However, permission to reprint/republish this material for advertising or promotional
purposes or for creating new collective works for resale or redistribution to servers or lists, or to
reuse any copyrighted component of this work in other works must be obtained from the IEEE. This
material is presented to ensure timely dissemination of scholarly and technical work. Copyright and
all rights therein are retained by authors or by other copyright holders. All persons copying this
information are expected to adhere to the terms and constraints invoked by each author's copyright.
In most cases, these works may not be reposted without the explicit permission of the copyright
holder.
This Article is brought to you for free and open access by the Department of Electrical and Computer Engineering at OpenSIUC. It has been accepted
for inclusion in Conference Proceedings by an authorized administrator of OpenSIUC. For more information, please contact opensiuc@lib.siu.edu.
Recommended Citation
Raghuraman, Krishna Prasad; Wang, Haibo; and Tragoudas, Spyros, "A Novel Approach to Minimizing Reconfiguration Cost for
LUT-Based FPGAs" (2005). Conference Proceedings. Paper 40.
http://opensiuc.lib.siu.edu/ece_confs/40
A Novel Approach to Minimizing Reconﬁguration Cost for LUT-Based FPGAs
Krishna Prasad Raghuraman, Haibo Wang, and Spyros Tragoudas
Department of Electrical and Computer Engineering
Southern Illinois University Carbondale, IL 62901
Abstract
This paper proposes a novel approach to reducing the
size of FPGA reconﬁguration bitstreams by ﬁxing appro-
priate orders for LUT inputs. With such LUT input orders,
memory locations that need to be altered during partial
reconﬁguration are relocated into common frames. We
present a novel problem formulation that relates the num-
ber of frames (that need to be downloaded into FPGAs)
to the number of minterms of a specially constructed logic
function. A heuristic procedure is developed to solve the
formulated problem in polynomial time. The proposed
methodology is validated by experiments conducted on
Xilinx Virtex FPGA platform. Considerable reduction on
the size of reconﬁguration bitstreams have been observed
from our experimental results.
1 Introduction
Implementing reconﬁgurable hardware using FPGAs
is a very active research direction. Quite a few FPGA re-
conﬁgurable systems have been developed for real appli-
cations. One important concern in FPGA reconﬁgurable
systems is reconﬁguration cost, which is normally pro-
portional to the size of reconﬁguration bitstreams. Previ-
ously, numerous techniques [1, 2, 3, 4, 5, 6, 7] have been
presented to reduce FPGA reconﬁguration cost at high
level. In this work, we address the problem of minimiz-
ing reconﬁguration data size at logic level. Techniques
developed in this work can be combined with previous
high level approaches to more efﬁciently reduce the size
of FPGA reconﬁguration data.
In many LUT-based FPGAs, conﬁguration data are nor-
mally partitioned into frames [8, 9]. A frame is the min-
imum size of conﬁguration data that can be read or writ-
ten into FPGAs. This work proposes to permute LUT
input orders such that memory bits that need to be altered
during a reconﬁguration are relocated into some common
frames. Consequently, the number of reconﬁguration frames
is reduced. A novel problem formulation for this opti-
mization problem is proposed in this paper. In addition,
an efﬁcient algorithm is developed to solve the formulated
problem.
The platform used in our experiment is Xilinx Virtex
architecture. Reconﬁguration frames for Xilinx Virtex
FPGAs are explained in the example depicted in Figure 1.
More details can be found in Virtex manuals [8, 9]. As
shown in Figure 1, a vertical column of FPGA real estate
contains N LUTs, which belong to different CLBs. Be-
cause it has four address inputs, each LUT has 16 memory
locations. These 16 memory locations of any LUT in the
column belong to 16 different frames. In addition, each
frame contains N bits, corresponding to the same mem-
ory locations in the N LUTs of the column. Since a frame
is the smallest portion of conﬁguration data that can be ac-
cessed by reconﬁguration commands, the entire frame has
to be written into the FPGA even if we just want to change
a single bit of a LUT during partial reconﬁguration. Al-
though this arrangement seems to increase the size of bit-
streams during partial reconﬁguration, it actually lessens
the burden of addressing each memory location. Conse-
quently, it simpliﬁes hardware design and reduces the size
of reconﬁguration bitstreams.
Frame 1 Frame 2 Frame 16 
Configuration bit for memory location 16 in LUT N
Configuration bit for 
memory location 1 
in LUT1
Configuration bit for 
memory location 1 
in LUT2
Configuration bit for 
 memory location 1 
 in LUT N
Frames of configuration data
LUT Column
LUT1
1
16
LUT
LUT
2
N
1
1
16
16
Figure 1. Virtex conﬁguration frames.
The rest of the paper is organized as follows. Sec-
tion 2 explains the proposed approach and describes the
problem formulation. Section 3 presents a heuristic al-
Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design (VLSID’05) 
1063-9667/05 $20.00 © 2005 IEEE 
Authorized licensed use limited to: Southern Illinois University Carbondale. Downloaded on May 29, 2009 at 10:32 from IEEE Xplore.  Restrictions apply.
gorithm to solve the formulated problem. Section 4 dis-
cusses experimental results, and the paper is concluded in
Section 5.
2 Preliminaries
The basic idea of the proposed approach is illustrated
in the following example. Assume that a column of LUTs
contains two LUTs, which are denoted as LUT1 and LUT2.
Also, we assume that functions implemented in both LUTs
are altered during reconﬁguration. The original and ﬁnal
functions of LUT1 are A·(B+C) and A+B, respectively.
Meanwhile, A · B + C and (A + B) · C are the original
and ﬁnal functions of LUT2. We use three-input LUTs in
this example for a simple and clear demonstration. Also,
the example assumes for simplicity in the explanation that
the original and ﬁnal functions of an LUT depend on the
same set of variables.
In the ﬁrst scenario, we assume input orders for both
LUTs are {A, B, C}. Consequently, contents stored in
both LUTs before and after reconﬁguration are shown in
Figure 2. Labels C1 and C2 are used to indicate LUT
data before and after reconﬁguration. In addition, we
use asterisks to mark memory locations whose contents
are changed during reconﬁguration. In this scenario, ﬁve
frames need to be downloaded into the FPGA. However,
if we change the input order for LUT2 to {C, A, B},
we need download only three frames as illustrated in Fig-
ure 3.
000 001 010 011 100 101 110 111
A3
A2
A1
A
*
*
* *
*
C1
C2
B
C
0 0 0 0 0 1 1 1
A
B
C
LUT1
LUT2
A3
A2
A1
0 1 0 1 0 1 1 1
0 0 1 1 1 1 1 1
0 0 0 1 0 1 0 1
C1
C2
LUT content before reconfiguration
LUT content before reconfiguration
LUT content after reconfiguration
LUT content before reconfiguration
Address of LUT locations 
Figure 2. Reconﬁguration data before per-
mutation.
Note that the output function of an LUT can be ex-
pressed either in terms of its logic input signals (A, B, C
in Figures 2 and 3) or in terms of its address inputs (A3, A2, A1).
For the convenience of description, we refer to the func-
tion deﬁned in terms of logic input signals as the logic
function of the LUT. In addition, we name the function
expressed in terms of LUT address inputs as LUT map-
000 001 010 011 100 101 110 111
A3
A2
A1
A
* * *
C1
C2
B
C
0 0 0 0 0 1 1 1
A
B
C
LUT1
LUT2
A3
A2
A1
0 0 0 1 1 1 1 1
0 0 1 1 1 1 1 1
0 0 0 0 0 1 1 1
C1
C2
LUT content before reconfiguration
LUT content before reconfiguration
LUT content after reconfiguration
LUT content before reconfiguration
Address of LUT locations 
Figure 3. Reconﬁguration data after permu-
tation.
ping function. For example, LUT1 in Figure 2 before re-
conﬁguration has logic function A · (B+C) and mapping
function A3 · (A2 + A1). The difference between logic
function and mapping function is the following. The logic
function of an LUT represents a cover of all the minterms
of the function it implements. However, the mapping
function of an LUT is a cover of all the memory loca-
tions that store logic 1. When permuting LUT inputs, we
change LUT mapping functions but keep logic functions
untouched. For example, LUT2 has the same logic func-
tion in Figure 2 and 3. However, it has different mapping
functions (A3 ·A2 + A1 in Figure 2 and A1 ·A2 + A3 in
Figure 3, both before reconﬁguration).
The following present formally the studied problem
which relates the number of frames (that need to be down-
loaded into FPGAs during partial reconﬁguration) to the
number of minterms of a specially constructed logic func-
tion. This formulation allows us to take advantage of
well-developed function manipulation procedures when
tackling the problem of minimizing FPGA reconﬁgura-
tion bitstreams. The following notations are used in our
discussion.
• N is the number of LUTs in one column
• LUTi indicate the ith LUT in the selected column
and 1 ≤ i ≤ N
• f1i and f 2i represent the mapping functions of LUT i
before and after reconﬁguration.
Functions f 1i and f 2i can be obtained as follows. As-
sume that two circuits C1 and C2 will be implemented on
an FPGA. C1 is the original circuit and C2 is the circuit de-
rived from C1 by performing partial reconﬁguration. Us-
ing any available FPGA design tool, circuits C1 and C2
can be separately mapped into the same area of the FPGA
layout. For any LUT, e.g. LUTi, located in the mapped
area, two logic functions, denoted by f 1LUTi and f
2
LUTi
,
Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design (VLSID’05) 
1063-9667/05 $20.00 © 2005 IEEE 
Authorized licensed use limited to: Southern Illinois University Carbondale. Downloaded on May 29, 2009 at 10:32 from IEEE Xplore.  Restrictions apply.
will be assigned. f 1LUTi is used in the implementation
of C1 and f 2LUTi is for C2. Such assignments build one-
to-one relations between LUT address inputs and logic
variables on which f 1LUTi and f
2
LUTi
depend. To obtain
f1i and f 2i , we can simply substitute the logic variables
by their corresponding LUT input addresses in their logic
function expressions.
The difference function between f 1i and f 2i is expressed
as:
Fi = f1i ⊕ f2i (1)
Note that the minterms of Fi represent memory locations
in LUTi that need to be altered to change the function
implemented on LUTi from f 1LUTi to f
2
LUTi
. From the
previous discussion, we know that each minterm of F i
will require a frame of conﬁguration data. For a given
LUT column, conﬁguration frames that cover all the LUT
locations that need to be altered can be calculated by per-
forming union operation for all the difference functions of
LUTs in the given column. This is:
F =
N⋃
i=1
Fi (2)
When performing union operations, address inputs with
the same name but located in different LUTs (e.g. A1
of LUTi and LUTj) will be treated as the same vari-
able. This is valid because address inputs in LUT map-
ping functions serve the same purpose: they are used as
coordinates to indicate memory locations that contain logic
1. The ﬁnal function obtained from the union operation
will depend on only four variables A4, A3, A2, A1. Hence,
the problem of minimizing reconﬁguration data is trans-
lated into a problem of ﬁnding proper input permutation
orders for a set of logic functions (Fi) such that the num-
ber of minterms of function F is minimized.
3 Proposed solution
Using exhaustive enumeration method to solve the
above formulated problem will be very time consuming
since there are 24N−1 possible combinations (assume that
a column contains N LUTs; each LUT has four inputs
and consequently results in 4! input permutations). To
efﬁciently search optimal LUT input orders, this section
presents a search procedure based on greedy algorithm.
Its major steps are described in Figure 4. It ﬁrst constructs
LUT difference functions (line 3) and, concurrently, ﬁnds
the LUT that requires the least number of reconﬁgura-
tion frames (lines 4 ∼ 8). The input order of the se-
lected LUT will not be permuted, and is used as a ref-
erence when permuting other LUT input orders. Also,
function MintermCount used in line 4 counts the number
of minterms of its operand function. After the reference
LUT is selected, the algorithm sequentially picks an un-
processed LUT and permutes its inputs. The permutation
procedure is sketched from line 12 to 22. It exhaustively
tries all the possible permutations and picks the one that
results in the smallest increase on the number of minterms
of the newly constructed union function (F tmp). The time
complexity of the proposed procedure is 24 · (N − 1),
which is signiﬁcantly smaller than the time complexity of
the exhaustive enumeration method.
1 min tmp = 16
2 for i = 1 to N
3 F[i] = f1i ⊕ f2i
4 min = MintermCount(F[i])
5 if min < min tmp)
6 min tmp = min
7 min index = i
8 F = F[i]
9 for i = 1 to N
10 if i = min index)
11 F = permute(F , F[i])
12 permute( F , F[i] ) {
13 min tmp = 16
14 for each permutation order of LUTi
15 derive new function F′[i] according
to the new input order
16 Ftmp = F
⋃
F ′[i]
17 min = MintermCount( Ftmp )
18 if min < min tmp)
19 min tmp = min
20 Order[LUTi] = current permut. order
21 Fmin = Ftmp
22 return Fmin }
Figure 4. The proposed search procedure.
4 Experimental results
The proposed search procedure has been implemented
on the top of a Binary Decision Diagram (BDD) pack-
age [10]. In the experimental ﬂow, we use ISCAS85 cir-
cuits as the initial circuits that are implemented on FP-
GAs before reconﬁguration. The hardware platform used
in our experiments is Xilinx Virtex 1000 device. In addi-
tion, Xilinx ISE design tool is used to map example cir-
cuits and generate conﬁguration data.
Due to the lack of suitable partial reconﬁguration
benchmark circuits, we derive FPGA ﬁnal circuits, which
are to be implemented after partial reconﬁguration, by
performing random function modiﬁcation on original cir-
cuits. In this process, we ﬁrst deﬁne a set of func-
tions, denoted by f1, f2, · · · fi, which depend on variables
A4, A3, A2, A1. Then, we derive the ﬁnal logic func-
tion for a selected LUT by performing either COMPOSE
Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design (VLSID’05) 
1063-9667/05 $20.00 © 2005 IEEE 
Authorized licensed use limited to: Southern Illinois University Carbondale. Downloaded on May 29, 2009 at 10:32 from IEEE Xplore.  Restrictions apply.
or INTERSECT operation with using the original LUT
function and one function selected from f1, f2, · · · fi as
operands. The COMPOSE and INTERSECT are func-
tion manipulation operations deﬁned in the BDD pack-
age. The selection on operation (COMPOSE or INTER-
SECT) and operand function (f1, f2, · · · fi) is totally ran-
domized.
For the formed FPGA circuits we generate reconﬁgu-
ration data using Xilinx ISE design automation tool. Ini-
tially, we simply follow the traditional design ﬂow with-
out permuting LUT inputs. Then, we re-generate recon-
ﬁguration bits by using the proposed approach. We ﬁnd
the optimal LUT input orders and force the Xilinx design
tool to keep such orders during the generation of recon-
ﬁguration data. The sizes of reconﬁguration bitstreams
obtained using the above two approaches are compared
in Table 1. It shows that around 15% reduction on the
number of frames can be achieved by using the proposed
method. In the experiment, we also vary the size of LUT
column to provide more case studies.
Table 1. Comparison of Reconﬁguration
frames.
Circuit No.of LUTs No. of Frm. No. of Frm. savings
without with
Name per column Permutation Permutation (%)
C432 3 274 244 11%
4 233 212 10%
8 166 136 18%
C1355 3 142 137 4%
6 124 111 10%
9 106 95 10%
C1908 3 255 239 10%
6 198 175 12%
9 172 141 18%
C2670 3 430 389 10%
6 334 286 14%
9 276 232 16%
C3540 6 771 659 15%
9 632 506 20%
12 567 409 28%
C5315 9 769 617 21%
12 626 574 8%
15 542 440 19%
C6288 12 1168 964 17%
15 986 826 16%
18 852 712 16%
C7552 12 967 780 9%
15 814 660 19%
18 693 570 18%
5 Concluding remarks
In this paper, we presented a methodology to reduce
the size of reconﬁguration bitstreams for LUT-based FP-
GAs. This is achieved by properly ordering LUT inputs
when mapping circuits into FPGAs. Furthermore, the
problem of ﬁnding such proper orders is formulated and
solved using a heuristic algorithm based greedy method.
Our approach tackles the problem of minimizing FPGA
reconﬁguration cost at logic level. To the best of our
knowledge, this type of problem was rarely addressed at
logic level. Our work demonstrates a new dimension on
minimizing FPGA reconﬁguration cost. Experimental re-
sults show that the size of reconﬁguration data can be re-
duced around 15% by the proposed method alone. In ad-
dition, without any compromise, the proposed method can
be combined with other techniques that reduce FPGA re-
conﬁguration cost through high-level optimization to fur-
ther reduce FPGA reconﬁguration cost.
References
[1] Zhining Huang and Sharad Malik, “Managing dynamic
reconﬁguration overhead in systems-on-a-chip design us-
ing reconﬁgurable datapaths and optimized interconnec-
tion networks,” in Proceeding of Conference of Design,
Automation and Test in Europe, pp. 13–16, 2001.
[2] Daler Rakhmatov and Sarma B.K. Vrudhula, “Minimiz-
ing routing conﬁguration cost in dynamically reconﬁg-
urable FPGAs,” in Proceedings of Parallel and Dis-
tributed Processing Symposium, pp. 1481–1488, 2001.
[3] K.Compton, J.Cooley and S.Knol, “Conﬁguration reloca-
tion and defragmentation for reconﬁgurable computing,”
in Proc. of FPCCM, pp. 79–80, 2000.
[4] K.Compton, Z.Li,S.Knol and S.Hauck, “Conﬁguration re-
location and defragmentation for reconﬁgurable comput-
ing,” vol. 10, pp. 209–220, 2002.
[5] Anna Antola, Vincenzo Piuri, Mariagiovanna Sami, “On-
line Diagnosis and Reconﬁguration of FPGA Systems,” in
Proceeding of Electronic Design, Test and Applications,
pp. 291–296, 2002.
[6] Douglas Chang and Malgorzata Marek-Sadowska, “Par-
titioning Sequential Circuits on Dynamically Reconﬁg-
urable FPGAs,” IEEE Transaction on Computers, vol. 48,
no. 6, pp. 565–578, 1999.
[7] S.Dueck and W.Kinsner, “Netlist Partitioning for FPGA-
Based Run-Time Reconﬁguration,” in Proceedings of
2002 IEEE Canadian Conference on Electrical and Com-
puter Engineering, pp. 584–590, 2002.
[8] XILINX Inc., Virtex Series Conﬁguration Architecture
User Guide, 2003.
[9] XILINX Inc., Two Flows for Partial Reconﬁgura-
tion:Module Based or Small Bit Manipulations, 2002.
[10] Fabio Somenzi, “Cudd package.”
http://vlsi.colorado.edu/ fabio/CUDD/cuddIntro.html.
Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design (VLSID’05) 
1063-9667/05 $20.00 © 2005 IEEE 
Authorized licensed use limited to: Southern Illinois University Carbondale. Downloaded on May 29, 2009 at 10:32 from IEEE Xplore.  Restrictions apply.
