Architecture of a Multi-Context FPGA Using Reconfigurable Context Memory by 亀山 充隆
Architecture of a Multi-Context FPGA Using Reconﬁgurable Context Memory
Weisheng CHONG, Sho Ogata, Masanori HARIYAMA and Michitaka KAMEYAMA
Graduate School of Information Sciences, Tohoku University
Aoba 6–6–05, Aramaki, Aoba–ku, Sendai,
Miyagi, 980–8579, Japan
{cwsheng, ogata, hariyama, michi}@kameyama.ecei.tohoku.ac.jp
Abstract
Dynamically-programmable gate arrays (DPGAs)
promise lower-cost implementations than conventional FP-
GAs since they efﬁciently reuse limited hardware re-
sources in time. One of typical DPGA architectures is a
multi-context one. Multi-context FPGAs (MC-FPGAs) have
multiple memory bits per conﬁguration bit forming con-
ﬁguration planes for fast switching between contexts.
The additional memory planes cause signiﬁcant over-
head in area and power consumption. To overcome
the overhead, a ﬁne-grained reconﬁgurable architec-
ture called reconﬁgurable context memory (RCM) is pre-
sented based on the fact that there are redundancy and
regularity in conﬁguration bits between different con-
texts. Switch blocks are efﬁciently implemented by using
RCM as context decoders and routing switches. By us-
ing the RCM in logic blocks, an adaptive multi-context
logic block table is introduced where the size of look-up ta-
bles and the number of different conﬁguration planes of
look-up tables are adaptively determined at each logic
block. Moreover, non-volatile ferroelectric-based func-
tional pass-gates are used as components of the RCM to
achieve compactness and low static power. Under a con-
straint of the same number of contexts, an area of the pro-
posed MC-FPGA is 45% of that of the conventional
MC-FPGA. In the functional-pass-gate-based evalua-
tion, the area of the proposed MC-FPGA is reduced to 37%
of the conventional MC-FPGA one.
1. Introduction
Dynamically-programmable gate arrays (DPGAs) [1]
provide more cost-effective implementations than con-
ventional FPGAs where hardware resources are dedi-
cated to a single context. A DPGA can be sequentially
conﬁgured as different processors in real time, and ef-
ﬁciently reuse the limited hardware resources in time.
Cell Cell
Cell Cell
Multi-context switch
Switch block
Configuration data
Logic block
G2G1 G3
G5G4 G6
G8G7 G9
Figure 1. Overall structure of an MC-FPGA
Configuration  bit
S1
S0
Memory bit
C2=1 C1=1C3=1 C0=1
M MM M
G9
Context ID bits
Figure 2. Conventional multi-context switch
(four contexts)
One of typical DPGA architectures is a multi-context
one. Multi-context FPGAs (MC-FPGAs) have multi-
ple memory bits per conﬁguration bit forming conﬁgura-
tion planes for fast switching between contexts. However,
the additional memory planes cause signiﬁcant over-
head in area and power consumption [2]. Figure 1 shows
the overall structure of an MC-FPGA. Each cell con-
sists of a programmable logic block and a programmable
switch block. Figure 2 shows the structure of a conven-
tional multi-context switch. The switch has multiple mem-
ory bits for multi-contexts and its contexts are selected
from the memory bits according to a context ID. In the con-
Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05) 
1530-2075/05 $ 20.00 IEEE
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 01:34 from IEEE Xplore.  Restrictions apply.
ContextContextContext Context 
1
0
0
0
0
1
1
0
1
0
1
1
0
1
0
G9
G4
G3
G2
G1
1
0
0
0
1
3 (C3) 2 (C2) 1 (C1) 0 (C0)
Table 1. Redundancy and regularity in conﬁg-
uration data
Context0Context 1Context2Context 3
S0
S1
0101
0011
Table 2. Relations between contexts and
context-ID bits
ventional approach, each switch requires n bits to store n
contexts. Most previous works for DPGAs reduce the over-
head using device-level solutions. That is, compact memory
devices such as DRAM and FeRAMwere used to store con-
ﬁguration data [1, 3].
To reduce the overhead of conﬁguration memory in MC-
FPGAs, this paper proposes an architectural-level solution
based on the fact that there are redundancy and regularity in
conﬁguration bits between contexts. To illustrate the redun-
dancy and regularity, Table 1 shows an example of conﬁgu-
ration data of the switch block shown in Fig. 1. Each row de-
notes conﬁguration data of each switch. The conﬁguration
data G3 and G9 have redundancy in themselves. That is,
there is no change in their conﬁguration bits. It is said that
less than 3% of conﬁguration data are changed when con-
texts are switched [4]. There is another type of redundancy
between conﬁguration data of different switches. For exam-
ple, G2 and G4 have the same conﬁguration data. More-
over, there is regularity in conﬁguration data such as G2
and G4. The conﬁguration data G2 and G4 can be repre-
sented by repeating bits in an order of (0,1). To exploit the
redundancy and regularity, a ﬁne-grained reconﬁgurable ar-
chitecture called “reconﬁgurable context memory” is pre-
sented. The switch block consists of ﬁne-grained switch el-
ements where each switch element use a single 2-to-1 mul-
tiplexer, two memory bits and a pass-gate as components.
1111
Hardware 
generation of G
Configuration bit (G)
0000
Context 0 
(C0)
Context 1 
(C1)
Context 2 
(C2)
Context 3 
(C3)
G
M
M
0
1
Memory 
bit
Figure 3. Conﬁguration-bit patterns that are
independent of a context ID
The switch elements are used in two ways. Firstly, they
are used as programmable interconnections between logic
blocks like conventional FPGAs. Secondly, they are used
to make reconﬁgurable decoders that generate conﬁgura-
tion bits from the context ID. By exploiting the redundancy
and regularity in conﬁguration data, the decoders are con-
ﬁgured in an area-efﬁcient way. Moreover, we show that
ferroelectric-based functional pass-gates [5] can be used to
implement the switch elements for area efﬁciency. To ex-
ploit the redundancy of conﬁguration data in logic blocks
for area efﬁciency, an adaptive multi-context logic block is
introduced. The number of inputs of look-up tables (LUTs)
and the number of different conﬁguration planes of LUTs
are adaptively determined at each logic block. LUTs with a
larger number of inputs reduce the total number of required
LUTs for a mapping.
Under a constraint of the same number of contexts, the
area of the MC-FPGA using the reconﬁgurable context
memory and adaptive multi-context logic block is compared
to that of a conventional MC-FPGA. In the CMOS-circuit-
based evaluation, the area of the proposed MC-FPGA is
45% of the conventional MC-FPGA one. In the functional-
pass-gate-based evaluation, the area of the proposed MC-
FPGA is 37% of the conventional MC-FPGA one.
2. Redundancy And Regularity in Conﬁgura-
tion Data
Redundancy and regularity in conﬁguration data can be
used to reduce the area of the context memory. In this paper,
an architecture with four contexts is considered as an exam-
ple although our approach is also applicable to architectures
with other number of contexts. Contexts are switched by a
2-bit context ID (bit S1 and bit S0) as shown in Table 2.
For a 4-context switch, there are 16 possible conﬁguration-
bit patterns as listed in Figs. 3, 4 and 5. Note that each row
in the ﬁgures represents one of the conﬁguration-bit pat-
terns for the switch. Figure 3 shows conﬁguration-bit pat-
Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05) 
1530-2075/05 $ 20.00 IEEE
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 01:34 from IEEE Xplore.  Restrictions apply.
1010
Hardware generation 
of G
Configuration bit (G)
0011
0101
1100
Context 0 
(C0)
Context 1 
(C1)
Context 2 
(C2)
Context 3 
(C3)
S0
S0
S1
G
S1
Figure 4. Conﬁguration-bit patterns that de-
pend on a context-ID bit
terns that are independent from the context ID because the
switch is programmed to be always turned on or off. A sin-
gle memory bit is sufﬁcient to control the switch, while four
memory bits is required for the conventional switch shown
in Fig. 2. Figure 4 shows conﬁguration-bit patterns that de-
pend on a single context-ID bit. Note that each bit pattern is
same as the bit patterns of S1 (or S1) or S0 (or S0) shown
in Table 2. A switch using a single context-ID bit is smaller
than a conventional switch which uses two context-ID bits.
Figure 5 shows the other conﬁguration-bit patterns that de-
pend on S1 and S0. Each bit pattern can be generated using
a 2-to-1 multiplexer as shown in the right-most column of
Fig. 5. The multiplexer is slightly larger than the hardware
to generate the bit patterns shown in Figs. 3 and 4. How-
ever, the bit patterns in Fig. 5 are not frequently used in a
multi-context architecture since less than 3% of conﬁgura-
tion bits change when contexts are switched [4].
3. Switch Block Architecture Using the
Reconﬁgurable Context Memory
Figure 6 shows a basic MC-FPGA architecture that uses
reconﬁgurable context memory (RCM) as switch blocks. A
detail architecture will be discussed in the next paragraph.
Figure 7 shows the structure of the RCM that consists of
ﬁne-grained switch elements (SEs), programmable switches
(denoted by P) and input controllers (denoted by C). A pro-
grammable switch connects a vertical track with a horizon-
tal track as shown in Fig. 7(b). An input controller can be
programmed to invert its input as shown in Fig. 7(c). An
SE consists of a pass-gate, a multiplexer and two mem-
ory bits (D1 and D0) as shown in Fig. 8. As described in
the previous section, conﬁguration-bit patterns with redun-
dancy and regularity are frequently used. They can be im-
plemented with much simpler circuits than the conventional
0010
0110
0111
1011
1101
1001
0001
0100
Hardware 
generation of 
G
Configuration data (G)
1110
1000
Context 0 
(C0)
Context 1 
(C1)
Context 2 
(C2)
Context 3 
(C3)
0 1
S0
S1
0 1
S0
S1
0 1
S0
S1
S0S0
0 1S1
S0
0 1S1
0 1 
S0
S1
S0S0
0 1S1
0 1
S0
S1
0 1
S0
S1
0 1
S0
S1
Figure 5. Conﬁguration-bit patterns that de-
pend on two context-ID bits
circuits as shown in Figs. 3 and 4. The RCM is designed in
such a way that the frequently-used conﬁguration-bit pat-
terns are implemented area-efﬁciently using a single SE.
For an example, to implement a switch with a conﬁgura-
tion bit-pattern shown in the bottom row of Fig. 3, D0 is
1 and D1 is 0. As another example, to implement a switch
with a conﬁguration bit-pattern shown in the bottom row of
Fig. 4, D1 is 1 and a multiplexer variable input (U) is con-
nected to S1. Conﬁguration-bit patterns shown in Fig. 5 are
not frequently used and are implemented using several SEs.
Figure 9 shows an example to generate the conﬁguration-
Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05) 
1530-2075/05 $ 20.00 IEEE
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 01:34 from IEEE Xplore.  Restrictions apply.
Reconfigurable 
context memory
Logic block LB
RCM
LB
RCM
LB
RCM
LB
RCM
LB
RCM
LB
RCM
Figure 6. Basic MC-FPGA architecture using
the reconﬁgurable context memory as switch
blocks
(b) Programmable switch (c) Input controller
C CC
CCC
Switch element
Programmable switch
SE
SE
SE
SE
SE
Input controller
SE
SE
SE
SE
SE
SE
SE
SE
SE
SE
SE
(a) Overall structure
Memory bit
M
P
M
C
P
P
P
SE
P
P
P
P
P
P
P
P
P
Figure 7. Structure of the reconﬁgurable con-
text memory
bit patterns where (C3, C2, C1, C0) = (1, 0, 0, 0). Four SEs
are sufﬁcient to form the multiplexer. Wires that are used to
form the multiplexer are indicated by thick lines.
High speed double-length lines are used in the MC-
FPGA to complement routing delay in the RCM. The delay
is large if a signal is routed through many SEs in series. Fig-
ure 10 shows double-length lines that bypass alternate dia-
mond switches. Each diamond switch connects a line from
one direction to another three lines at different directions.
The double-length lines are connected to the logic blocks
through RCM blocks. Figure 11 shows a diamond switch
U01
U11
110
0
G
0
D1
0
D0
G = constant
M
M
G = variable 
input
U
0 1
GD1
D0
Memory bit
Figure 8. Structure and function of a switch
element
S0
S1
G
0 1 
S1
M
0
1 M
1
S0
M
M 1
M
1
M
M
M
G
M 0
M
S1
M
M
1
P
P
P
: memory bitM
1
1
00
PSE
SE
SE
0
1
0 1
S10
0
0
S1
S1
C SE
0
01
M
(C3, C2, C1, C0) = (1, 0, 0, 0)
Figure 9. Combination of switch elements
to generate the conﬁguration-bit pattern (C3,
C2, C1, C0) = (1, 0, 0, 0)
that consists of SEs and connects to the RCM through U1
to U6. To achieve a short delay time in a mapping, critical
paths are routed with double-length lines while non-critical
paths are routed with RCM. To prevent RCM from degrad-
ing the context-switching speed, context-ID bits are routed
with high-speed global wires and decoded locally with the
RCM.
4. Architecture of an Adaptive Multi-Context
Logic Block
The main component of an adaptive multi-context logic
block is a locally controlled multi-context multi-granularity
LUT (MCMG-LUT). Figure 12 shows anMCMG-LUT that
is programmable to be a 4-input LUT (four different conﬁg-
Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05) 
1530-2075/05 $ 20.00 IEEE
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 01:34 from IEEE Xplore.  Restrictions apply.
Reconfigurable 
context memory
Double-length 
line Logic block
Diamond switch
LB
RCM
LB
RCM
LB
RCM
LB
RCM
LB
RCM
LB
RCM
Figure 10. Switch-block structure with
double-length lines
SE
SESE
SESE
SE SE
U1
U3
U2
U4
U5 U6
Figure 11. Structure of the diamond switch
uration planes) or a 5-input LUT (two different conﬁgura-
tion planes), where each dashed box represents a conﬁgu-
ration plane. A conﬁguration plane is a group of memory
bits that are selected under the same context-ID state. Note
that two conﬁguration bits (S0, S1) are used in the 4-input
LUT and only one conﬁguration bit (S0) is used in the 5-
input LUT. Without changing the number of memory bits,
the size of an MCMG-LUT [6] can be increased by reduc-
ing its number of different conﬁguration planes. The size
represents the number of computation data that are selected
as inputs of an LUT.
Figure 13 shows the mapping of DFGs in contexts 1 and
2 into globally controlled MCMG-LUTs. A global control
signal (J) programs each of the MCMG-LUTs as a 2-input
LUT (two different conﬁguration planes) as shown in Fig.
13(b). Using a global control signal is not area-efﬁcient be-
cause redundant conﬁguration data is stored in the MCMG-
LUTs. For example, two conﬁguration planes of LUT3 in
P2P1
Z
S0
0
15
4-in 
MUX
5 computation data
4-input LUT with four different configuration planes (a)
Z
S0 S1Configuration-plane 1, 
P1
0
15
4-in 
MUX
P2 P3 P4
4 computation data
:memory bit
5-input LUT with two different configuration planes (b)
Figure 12. Number of different conﬁguration
planes and the size for a multi-context multi-
granularity LUT
Fig. 13(b) store the same conﬁguration data for O3 that is
repeated in contexts 1 and 2.
Figure 14 shows the mapping of the DFGs into locally
controlled MCMG-LUTs to achieve area efﬁciency. The
DFGs are redrawn as shown in Fig. 14(a) where nodes O2
and O3 are shared between contexts 1 and 2, and the shared
nodes are combined as O5. Figure 14(b) shows that two lo-
cally controlled MCMG-LUTs are sufﬁcient to map the re-
drawn DFG compared to three globally controlled MCMG-
LUTs shown in Fig. 13(b). Each locally controlled MCMG-
LUT has a programmable size-controller that causes area
overhead if a dedicated controller is used. To reduce the area
overhead, the RCM is used to form the controller that is only
required when there are different conﬁguration planes.
Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05) 
1530-2075/05 $ 20.00 IEEE
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 01:34 from IEEE Xplore.  Restrictions apply.
(b) LUT
:memory bit
MMMM
MMMMCD for O4
R
T
X
MMMM
MMMM
V
CD for O2 CD for O2
Y
Z
W
MMMM
MMMMCD for O3 CD for O3
LUT2 (2 inputs)
LUT3 (2 inputs)
S0
1
0
1
S0 0
1
S0 0
1
Configuration 
data (CD) for 
O1
2 different 
configuration 
planes
LUT1       
(2 inputs)J
(a) DFG
Context 1 (S0 = 0) Context 2 (S0 = 1)
O4
O2
X
Z
O3
Y
X
Z
O1
O2
O3
Y
R T V WR T V W
Figure 13. Mappping of DFGs into globally
controlled MCMG-LUTs
5. Evaluation
The proposed MC-FPGA using the RCM and adaptive
multi-context logic blocks is compared with a typical MC-
FPGA. The typical MC-FPGA uses switch blocks and logic
blocks with ﬁxed context memory. Let us assume that the
number of contexts is four and 6-input 2-output MCMG-
LUTs are used. The percentage of changes in conﬁguration
data between contexts is assumed to be 5% based on the fact
that less than 3 % of new conﬁguration memory bits are dif-
(b) LUT
CD for O5
MMMM
MMMM
Configuration data 
(CD) for O1
CD2 for O4
R
T
X
V
Z
MMMM
MMMM
W
2 configuration planes
single configuration plane
S0
M
1
0
1S0
LUT2 (3 inputs)
Size controller 
M
0
0
1
LUT1 (2 inputs)
:memory bit
(a) DFG that considers redundancy between different contexts
O1
X
O5 (for contexts 1 and 2)
O2
Z
O3
Y
mux
R TR T WVS0
O4sel
Figure 14. Mappping of a DFG into locally
controlled MCMG-LUTs
ferent from those already in the conﬁguration memory [4].
Under a constraint of the same number of contexts, an area
of the proposed MC-FPGA is 45% of the area of the typi-
cal MC-FPGA.
The RCM can be implemented area-efﬁciently by us-
ing ferroelectric-based functional pass-gates (FePGs) [5] as
SEs. FePGs are compact because logic and storage func-
tions are merged at the device level. FePGs can also re-
duce static power consumption because conﬁguration data
are stored in non-volatile ferroelectric devices. Figure 15
shows the circuit of an FePG, its equivalent CMOS circuit
and its truth table. Same as an SE, an FePG selects a con-
stant or a variable input depending on contents of two mem-
Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05) 
1530-2075/05 $ 20.00 IEEE
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 01:34 from IEEE Xplore.  Restrictions apply.
U11
101
010
U
G
0
d1
0
d0
Memory bit
Ferroelectric device
d1
M
G
d0
U
WL
RL
BLW
d0:configuration data stored in a ferroelectric device
(a) Circuit of a ferroelectric-based functional pass-gate
(c) Truth table(b)Equivalent circuit
U
G
M 0
M
d0
1
d1
G = constant
G = variable 
input
Figure 15. Ferroelectric-based functional-
pass-gate
ory bits. The area of an FePG-based SE is 50% of that of a
CMOS-based SE. The area of the proposed MC-FPGA us-
ing FePG-based SEs is estimated to be 37% of that of a typ-
ical CMOS-based MC-FPGA.
6. Conclusion
An MC-FPGA architecture using reconﬁgurable context
memory and adaptive multi-context logic blocks is pro-
posed to reduce overhead of conﬁguration memory. Map-
ping tools that exploit regularity and redundancy of conﬁg-
uration bits will be investigated in the future to support the
architecture.
Acknowledgment
This work was supported in part by Industrial Technol-
ogy Research Grant Program from New Energy and In-
dustrial Technology Development Organization (NEDO) of
Japan.
References
[1] A. DeHon. Dynamically programmable gate arrays: a step to-
ward increased computational density. In Proceedings of the
Fourth Canadian Workshop on Field-Programmable Devices,
pages 47–54, 1996.
[2] S. Trimberger et al. A time-multiplexed FPGA. In FCCM’97
Proceedings, pages 22–28, 1997.
[3] S. Masui et al. A ferroelectric memory-based secure dynam-
ically programmable gate array. IEEE J. Solid-State Circuits,
38(5):715– 725, May 2003.
[4] I. Kennedy. Exploiting redundancy to speedup reconﬁgura-
tion of an FPGA. In FPL, pages 262–171, Sep. 2003.
[5] H. Kimura, T. Hanyu, M. Kameyama, Y. Fujimori, T. Naka-
mura, and H. Takasu. Complementary ferroelectric-capacitor
logic for low-power logic-in-memory VLSI. IEEE J. Solid-
State Circuits, 39(6):919 – 926, June 2004.
[6] T. M.Iida. A proposal of programmable logic architecture for
reconﬁgurable computing. In Proc. of ITC-CSCC, volume 3,
pages 1547–1550, 2002.
Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05) 
1530-2075/05 $ 20.00 IEEE
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 01:34 from IEEE Xplore.  Restrictions apply.
