Hyperreconfigurable architectures and the partition into hypercontexts problem by Lange, Sebastian & Middendorf, Martin
J. Parallel Distrib. Comput. 65 (2005) 743 – 754
www.elsevier.com/locate/jpdc
Hyperreconfigurable architectures and the partition into hypercontexts
problem
Sebastian Lange, Martin Middendorf∗
Parallel Computing and Complex Systems Group, Department of Computer Science, University of Leipzig, Augustusplatz 10/11, D-04109 Leipzig, Germany
Received 2 August 2004; received in revised form 14 January 2005; accepted 23 January 2005
Available online 13 March 2005
Abstract
Dynamically reconfigurable architectures or systems are able to reconfigure their function and/or structure to suit the changing needs
of a computation during run time. The increasing flexibility of modern dynamically reconfigurable systems improves their adaptability
to computational needs but also makes fast reconfiguration difficult because of the large amount of reconfiguration information which
has to be transferred. However, even when a computation uses this flexibility it will not use it all the time. Therefore, we propose
to make the potential for reconfiguration itself reconfigurable. Such architectures are called hyperreconfigurable. Different models of
hyperreconfigurable architectures are proposed in this paper. We also study a fundamental problem that emerges on such architectures,
namely, to determine for a given computation when and how the potential for reconfiguration should be changed during run time so that
the reconfiguration overhead is minimal. It is shown that the general problem is NP-hard but fast polynomial time algorithms are given
to solve this problem for special types of hyperreconfigurable architectures. We define two example hyperreconfigurable architectures and
illustrate the introduced concepts for corresponding application problems.
© 2005 Elsevier Inc. All rights reserved.
Keywords:Dynamic reconfiguration; Reconfigurable architectures; Context partitioning; Reconfiguration costs
1. Introduction
Dynamically reconfigurable architectures or systems can
adapt their function and/or structure to suit the changing
needs of a computation during run time (e.g., [2,3]). A
principle problem of dynamically reconfigurable systems is
the tradeoff between flexibility and the amount of informa-
tion needed for reconfiguration to define the new state of
the system. Moreover, the increasingly higher integration
of reconfigurable hardware, e.g. reconfigurable circuits on
an FPGA chip, requires increased bandwidths for transfer-
ring the reconfiguration information. Modern FPGAs, for
 This work was financed by the German Research Foundation (DFG)
through the project “Models and Algorithms for Hyperreconfigurable
Architectures” within the priority programme 1148 “Reconfigurable Com-
puting Systems”.
∗ Corresponding author. Fax: +49 3419732329.
E-mail addresses:langes@informatik.uni-leipzig.de(S. Lange),
middendorf@informatik.uni-leipzig.de(M. Middendorf).
0743-7315/$ - see front matter © 2005 Elsevier Inc. All rights reserved.
doi:10.1016/j.jpdc.2005.01.003
example, need several megabytes of reconfiguration data for
a single reconfiguration step. This large amount of data trans-
fer makes dynamic reconfigurations time critical operations,
especially, for computations which exploit the full capacity
of dynamically reconfigurable architectures by frequent re-
configurations.
Different approaches have been proposed in the litera-
ture to cope with this problem. Dandalis and Prasanna[4]
have applied off-line compression methods to the stream of
reconfiguration bits. The compressed stream of reconfigura-
tion bits can be loaded faster onto the chip. Additional hard-
ware is necessary on the chip which allows to decompress
the reconfiguration bit stream during run time before it is
needed to define the next configuration. Another method for
compression of the reconfiguration bit stream which is suit-
able especially for the Xilinx XC6200 architecture has been
described by Hauck et al. [7]. For Multi FPGA systems it
has been proposed by Lee and Wong [11] to perform the re-
configuration incrementally so that only parts of the FPGAs
brought to you by COREView metadata, citation and similar papers at core.ac.uk
provided by Qucosa - Publikationsserver der Universität Leipzig
744 S. Lange, M. Middendorf / J. Parallel Distrib. Comput. 65 (2005) 743–754
need to be reconfigured at the same time. A third approach is
to use self-reconfigurability which means that the reconfigu-
ration bits are computed directly on the chip so that they can
be transferred faster to the system units that are reconfigured
(see Köster and Teich[9], Sidhu et al. [12], Wadhwa and
Dandalis [16]). All these approaches have in common that
they do not change the reconfiguration information itself.
In this paper, we propose a new approach to make run
time reconfiguration faster by defining a new type of re-
configurable architectures. We use the fact that algorithms
or computations typically consist of different phases where
during each phase only a fraction of the reconfiguration po-
tential of the underlying architecture is needed. The idea is
to make the reconfiguration potential itself reconfigurable.
The smaller the actual reconfiguration potential of an ar-
chitecture is the smaller will the amount of reconfiguration
information be that has to be transferred during reconfigu-
ration and the faster will a reconfiguration step be. We call
such architectureshyperreconfigurable architectures.
Hyperreconfigurable architectures use two types of recon-
figuration steps: (i) reconfiguration steps where the recon-
figuration potential of the architecture is defined (ii) stan-
dard reconfiguration steps which are used to reconfigure the
hardware according to the contexts demanded by the algo-
rithm. The first type of reconfiguration steps are called hy-
perreconfiguration steps.
A central problem that emerges on hyperreconfigurable ar-
chitectures is to determine when hyperreconfiguration steps
should be taken and how the reconfiguration potential should
be defined in these steps in order to minimize the total time
necessary for (hyper)reconfiguration of a computation. We
call this problem Partition into Hypercontexts (PHC) prob-
lem and show that it is NP-hard. We also describe poly-
nomial time algorithms for several variants of PHC on the
so called Switch model and DAG model of hyperreconfig-
urable architectures. To illustrate the ideas in this paper we
consider examples of two differently grained hyperreconfig-
urable architectures. For each of these architectures we study
an instance of the PHC problem and give optimal solutions
that were derived with the presented algorithms.
The paper is organized as follows. In Section 2, we de-
scribe the concept of hyperreconfigurable architectures. For-
mal models for such architectures and the Partition into Hy-
percontexts (PHC) problem are defined in Section 3. The
NP-Hardness of PHC is shown in Section 4. In Sections 5
and 6 we discuss polynomial time solvable cases of the PHC
problem. A variant of the PHC problem with changeover
costs is studied in Section 7. Experimental results for the
example architectures are presented in Section 8. The paper
ends with a conclusion in Section 9.
2. The concept of hyperreconfigurable architectures
We call dynamically reconfigurable architectures and sys-
tems which allow to alter the reconfiguration potential during
run timehyperreconfigurable architectures. Hyperreconfig-
urable architectures have two types of reconfiguration steps.
The (ordinary) reconfiguration stepsare used to actually
define a new configuration of the system. The state of the
system that can be changed by reconfiguration is called the
contextof a computation.Hyperreconfiguration stepsare
used for defining the actual reconfiguration potential of the
architecture that is available for the ordinary reconfigura-
tion steps. Thus, a hyperreconfiguration step defines the set
of contexts that is available for the (ordinary) reconfigura-
tion steps. Such a set of available contexts is called ahy-
percontext. With “available” we assign those reconfigurable
resources that are activated by the hypercontext and there-
fore are available for reconfiguration. If a reconfiguration
needs resources that are not included in the hypercontext
they have to be activated/included by a hyperreconfiguration.
We assume that a reconfiguration step requires reconfigura-
tion information for all activated resources (even when the
information is that an activated resource is not used in the
corresponding context). Thus, we are interested in the case
where the cost (e.g., the time or the amount of bits necessary
to be loaded onto the architecture) of a reconfiguration step
depends on the current hypercontext. Formal models for hy-
perreconfigurable architectures will be discussed in the next
section.
This concept is illustrated in the following example. Con-
sider a switch box for an FPGA where the state of each
switch is determined by the content of a corresponding
SRAM-cell as depicted on the right-hand side of Fig.1. The
content of these SRAM-cells is the current context of the
switch box.
During reconfiguration the SRAM-cells are chained se-
quentially to form a shift register shifting in one bit of the
new context and shifting out one bit of the old context at
each time step. In order to enable hyperreconfigurability a
second chain of SRAM cells is introduced to store the hyper-
context of the switch box. Each of these cells manipulates
two switches—one in front and one behind its corresponding
(context)-SRAM-cell. These switches control whether the
SRAM-cell is part of the switch register and the reconfigura-
tion bits are sent through the SRAM-cell during reconfigura-
tion or bypass it, thereby excluding it from reconfiguration.
In the example of Fig. 1 two of the three shown hypercon-
text SRAM cells contain a 1 which means the correspond-
ing context SRAM cells are included in the current chain
of context SRAM cells. The other hypercontext SRAM cell
contains a 0 which means the corresponding context SRAM
cell is bypassed and therefore not included in the current
chain of context SRAM cells. Since the time for reconfigu-
ration is determined by the number of bits that have to be
shifted in, it depends directly on the hypercontext. Loading
a new hypercontext is done analogously to reconfiguration:
the (hypercontext) SRAM-cells for a shift register and the
bits of the new hypercontext are shifted in. Since the num-
ber of bits in the hypercontext is always the same, the time
for a hyperreconfiguration step does not change. In this ex-

























Fig. 1. Design example of a hyperreconfigurable switch box: the example hypercontext has one hypercontext SRAM cell set to 0 so that the corresponding
context SRAM cell is bypassed for the context bits that are loaded into the chain of context SRAM cells.
ample it equals the time for a reconfiguration step where all
SRAM-cells are included in the hypercontext.
Often it will not be possible to determine the context re-
quirements of an algorithm exactly in advance. This is typ-
ically the case when a context depends on data that are
computed at run time. For the concept of reconfigurable
architectures it is enough when an upper bound on the re-
quirements that will actually be needed during run time can
be given. For example, it might be possible to know in ad-
vance that the routing requirements will be low during a
certain phase of the algorithm even when the exact rout-
ing is not known in advance. Note, that several methods
for resource estimation on reconfigurable architectures have
appeared in the literature (e.g., see[8]).
3. Formal models and the partition into hypercontexts
problem
In this section we introduce formal models for hyper-
reconfigurable architectures (or machines). The models
which we introduce are general models that allow us to con-
sider general algorithmic aspects for such architectures (and
in particular the PHC problem). For concrete architectures
these models can be made more specific.
We assume that an algorithm or a computation is charac-
terized by a sequence ofcontext requirements. Each context
requirement describes the resource requirements that the al-
gorithm/computation will have for a corresponding recon-
figuration step that is performed during the run of the algo-
rithm/computation. Hence, the number of context require-
ments equals the number of reconfiguration steps. Formally,
let C be the set of possible context requirements for a recon-
figurable machine. Then an algorithm/computation is char-
acterized by the sequence
C = c1 . . . cm
of its context requirements during run time. Since the ac-
tual reconfiguration steps might depend on data that is only
available at run time a context requirement always specifies
the (estimated) maximal set of resources that could possibly
be needed. When the meaning is clear we call the context
requirements of an algorithm/computation sometimes sim-
ply its contexts. The reason is that each context requirement
corresponds to exactly one new context that is reconfigured
during the run of the algorithm.
A reconfiguration into a new context can in general only
be realized during run time when the machine is in a hyper-
context that contains at least all contexts possible according
to the corresponding context requirement. In this case a hy-
percontextsatisfiesthe corresponding context requirement.
Formally, ahypercontextis a state of the reconfigurable ma-
chine which is characterized by the subset ofC context re-
quirements that are satisfied when the machine is in this
state. At any time exactly one hypercontext is realized on
the machine. LetH be the set of possible hypercontexts. For
a hypercontexth ∈ H let h(C) ⊂ C be the subset of context
requirements that are satisfied byh. The seth(C) is called
the context setof h. For a sequencec1 . . . ck of context re-
quirements and a hypercontexth let c1 . . . ck ⊂ h(C) denote
the fact that for each context requirementci , i ∈ [1 : k]
ci ∈ h(C) holds.
In the example hyperreconfigurable switch box (see Sec-
tion 2) the set of hypercontextsH can be the set of all subsets
of switches. A context requirementc can be a subset of the
switches. Then for a hypercontexth ∈ H relationc ∈ h(C)
holds if c ⊂ h, i.e., all switches that are required forc are
in the hypercontexth. The set of context requirements will
usually depend not only on the architecture but also on the
application and how good algorithm needs can be analyzed.
For the switch box example the set of context requirements
C can be the set of all subsets of switches. ThenH = C.
But it is also possible thatC contains only a few subsets of
switches, e.g., only the set of all switches and the set of all
switches on the diagonal of the switch box.
746 S. Lange, M. Middendorf / J. Parallel Distrib. Comput. 65 (2005) 743–754
In order to change the machine’s current hypercontext a
hyperreconfiguration stepis necessary. To measure the costs
for hyperreconfiguration steps and reconfiguration steps we
introduce for each hypercontexth ∈ H two cost measures:
(i) init(h) is the cost of performing a hyperreconfiguration
that brings the machine into hypercontexth (ii) cost(h) de-
notes the cost of an ordinary reconfiguration step when the
machine is in hypercontexth. In the example hyperreconfig-
urable switch box (see Section2) init(h) is the time to load
the hypercontext bits into the hypercontext SRAM cells and
cost(h) is the time to load the context bits into the chain of
context SRAM cells. Note that for this examplecost(h) de-
pends on the current number of SRAM cells that are in the
chain and are not bypassed.
A computation is characterized by a partition ofC into
substringsS1, . . . , Sr (i.e.C = S1 . . . Sr ) and hypercontexts
h1, . . . , hr , r1 such thatSi ⊂ hi(C) and
r∑
i=1
(init(hi) + cost(hi) · |Si |)
are the costs where|Si | is the length ofSi , i.e., the number of
context requirements inSi . When the algorithm/computation
is executed the machine performs the following reconfigu-
ration operations:h1S1 . . . hrSr whereSi stands for a se-
quence of|Si | reconfigurations which use only those parts
of the machine which are available within the hypercontext
hi . It is assumed that a hyperreconfiguration is always per-
formed before the first reconfiguration step.
Fig. 2 shows an example of a computation on a hyper-
reconfigurable machine. Assume the machine contains 22
hyperreconfigurable resourcesr1, . . . , r22, drawn as hori-
zontal lines of boxes. The computation consists of a string
S = c1 . . . c7 of 7 context requirements. Two hyperrecon-
figuration operations partitionS into two substringsS1 =
c1c2c3 andS2 = c4c5c6c7. During hyperreconfiguration re-
sources are selectively enabled or disabled. Hypercontext
h1 disables resourcesr3, r5, r6, r7, r14, r15, r16, r18 andr19.
Observe, that all context requirements in substringS1 do not
specify reconfiguration data for this resources and thus are
satisfied byh1. Likewise hypercontexth2 disables resources
r1, r2, r4, r6, r13, r14, r16, r17, r18 and r19 and satisfies all
context requirements in substringS2 = {c4, c5, c6, c7}.
An important problem that emerges for a hyperreconfig-
urable machine and a given algorithm (i.e. a sequence of
context requirements) is to define when hyperreconfigura-
tions are done and how corresponding hypercontexts are de-
fined such that the context requirements of the algorithm
are satisfied and the total costs for the hyperreconfiguration
steps and the ordinary reconfiguration steps are minimized.
The PHC problem can be defined as follows.
3.1. Partition into Hypercontexts (PHC) problem
Given a hyperreconfigurable machine (as described
above) and a sequenceC = c1 . . . cm of context require-
t












Fig. 2. Snapshot of a sample computation on a hyperreconfigurable ma-
chine with 7 context requirementsc1 . . . c7, where 2 hyperreconfigura-
tions into hypercontextsh1 and h2 are done; computations within the
respective current context are done between the reconfiguration opera-
tions; “data set” denotes the resources that are used within the context,
“data not set” denotes the resources that are available within the current
hypercontext but are not used by the context.
ments. Find a partition ofC into substringsS1, . . . , Sr (i.e.,
C = S1 . . . Sr ) and hypercontextsh1, . . . , hr , r1 with
Si ⊂ hi(C) and the total (hyper)reconfiguration costs are
minimal.
Two variants of the model for hyperreconfigurable ar-
chitectures are introduced. TheDAG-modelis for coarse-
grained reconfigurable machines where different reconfig-
urable submachines (hypercontexts) can be defined (through
hyperreconfiguration) that can be ordered with respect to
their computational power. It is assumed that a submachine
which is an extension of another submachine and which has
therefore a larger computational power produces more re-
configuration costs. This problem is typically used to model
coarse-grained reconfigurable machines.
A directed acyclic graph (DAG) describes the precedence
relation between the hypercontexts. Formally, given a DAG
G = (V ,E) with V = H and for eachh ∈ H a seth(C)
such that for each edge in(h1, h2) ∈ E the relationh1(C) ⊂
h2(C) holds. It is assumed that a hypercontexth exists that
satisfies all possible context requirements, i.e.h(C) = C. In
addition letcost(h) > 0 and init(h) = w for eachh ∈ H
and a constantk0 such that for each edge(h1, h2) ∈ E
cost(h1)cost(h2). Then a computation is characterized by
a partition ofC into substringsS1, . . . , Sr , r1 (i.e.C =
S1 . . . Sr ) and hypercontextsh1, . . . , hr such thatSi ⊂ hi(C)
and the total (hyper)reconfiguration costs are
r · w +
r∑
i=1
cost(hi) · |Si |.
For each context requirementc ∈ C let c(H) be the set of
minimal (with respect to the precedence relation defined by
S. Lange, M. Middendorf / J. Parallel Distrib. Comput. 65 (2005) 743–754 747
E) hypercontextsh in the DAG which satisfyc ∈ h(C). The
PHC problem for the DAG-model can be defined as follows.
3.2. PHC-DAG problem
Given a hyperreconfigurable machine in the DAG-model
and a sequenceC = c1 . . . cm of context requirements. Find
a partition ofC into substringsS1, . . . , Sr , r1 (i.e.C =
S1 . . . Sr ) and hypercontextsh1, . . . , hr such thatSi ⊂ hi(C)
and the total (hyper)reconfiguration costs are minimal.
The second variant of a hyperreconfigurable machine is
called Switch-model. In contrast to the DAG-model which
is more confined to coarse-grained hyperreconfigurable ma-
chines the Switch-model is also well suited for fine-grained
hyperreconfigurable machines. Here we assume that there
exists a set of small (similar) reconfigurable units and ev-
ery subset of these units can be used to define the reconfig-
urable machine that is available during a hypercontext. For
example, each unit might be a switch and the set of switches
defines the available part of the reconfigurable machine. An
example are hyperreconfigurable switch boxes (see Section
2) of an FPGA which are used for connecting the functional
units. The larger the (possible) routing requirements of an al-
gorithm for a context, the more switches should be available
in the hypercontext for reconfiguration during run time. For
reconfiguration the state of each available switch has to be
defined. Thus the cost for reconfiguration is just the number
of available units plus some overhead cost. Hence, for this
Switch model the reconfigurable machine that is available
during a hypercontext is defined by the subset of available
units.
Formally, letX = {x1, . . . , xn} be a set of switches and
defineC = H = 2X, i.e., the set of possible context require-
mentsC and the set of possible hypercontextsH equal the set
of all subsets ofX. For contextx ∈ X the relationx ∈ h(C)
holds, whenx ⊂ h. Let cost(h) = |h|, where|h| is the size
of h, i.e., the number of switches available inh. Let init(h) =
n for h ∈ H, which reflects the fact that for each switch it
has to be defined during hyperreconfiguration whether it is
available in the new hypercontext. A computation is charac-
terized by a partition ofC into substringsS1, . . . , Sr , r1
(i.e.,C = S1 . . . Sr ) and hypercontextsh1, . . . , hr such that
Si ⊂ hi(C) and the total (hyper)reconfiguration costs are
r · n +
r∑
i=1
|hi | · |Si |.
The PHC problem for the Switch-model can be defined
as follows.
3.3. PHC-Switch problem
Given a hyperreconfigurable machine in the Switch-model
with set of switchesX = {x1, . . . , xn} and a sequence of
context requirementsC = c1 . . . cm. Find a partition ofC
into substringsS1, . . . , Sr , r1 (i.e. C = S1 . . . Sr ) and
hypercontextsh1, . . . , hr such thatSi ⊂ hi(C) and the total
(hyper)reconfiguration costs are minimal.
Note that for the PHC-Switch problem there exist 2n hy-
percontexts but this number is not part of the size of the
problem instance which isn + m. This is different for the
PHC-DAG where the DAG is part of the instance and there-
fore the number of possible hypercontexts is also part of the
instance.
In order to solve the PHC problem we make a simple but
useful observation. A partial order4 on the set of hyper-
contexts (i.e.,4 is a reflexive, antisymmetric, and transi-
tive relation onH) can be defined naturally by the subset
relation on the sets of contexts that are satisfied by the hy-
percontexts. Thus,h14h2 iff h1(C) ⊂ h2(C) andh1 ≺ h2
iff h1(C) ⊂ h2(C) ∧ h1(C) = h2(C). The partial order4 is
calledcost consistent, when for each two hypercontextsh1,
h2 ∈ H with h1 ≺ h2 it follows that init(h1) init(h2) and
cost(h1)cost(h2) and< holds in at least one case.
Observation. If 4 is cost consistent then for a solution of
PHC, i.e., a partition ofC into substringsS1, . . . , Sr and
hypercontextsh1, . . . , hr such thatSi ⊂ hi(C), each hy-
percontextshi , i ∈ [1 : r] is minimal within the set of all
hypercontextsh ∈ H with Si ⊂ h(C) (i.e., there exists no
hypercontexth ∈ H, h = hi with Si ⊂ h(C) andh ≺ hi).
It is easy to see that relation4 is cost consistent for the
DAG-model and the Switch-model.
4. NP-Hardness
In this section we state that the general PHC problem is
NP-complete, which means it is unlikely that the problem
can be solved in polynomial time.
Theorem 1. The PHC problem is NP-complete.
We only give the proof idea because the proof itself is
somewhat technical and uses standard constructions for
proving NP-completeness. For a proof one can encode an
instance of an NP-hard problem, say 3-SAT, in a sequence
of contextsC. Then a cost function and a set of hypercon-
texts can be defined such that there exists a cheap partition
into hypercontexts ofC if and only if the partition consists
of a single hypercontext and the contexts inC encode an
instance of 3-SAT that is solvable such that there exists
no partition ofC into substrings which can be covered by
hypercontexts in a cheap way.
5. Polynomial time algorithm for PHC-DAG
The algorithm for the PHC-DAG problem which is de-
scribed in this section is a dynamic programming algorithm
that computes a tableM = (Mk,j )k∈[1:m],j∈[k:m] whereMk,j
748 S. Lange, M. Middendorf / J. Parallel Distrib. Comput. 65 (2005) 743–754
are the minimal costs for the prefix of lengthj of the se-
quence of context requirementsc1 . . . cm when usingk hy-
percontexts. The optimal solution for the PHC-DAG model
can then be derived with standard dynamic programming
techniques from this matrix.
In the following lethij be a cheapest hypercontext that
satisfies the contexts requirementsci, . . . , cj . The algorithm
for the PHC-DAG problem consists of the following steps:
(1) Preprocessing: For eachi, j ∈ [1 : m], i < j cost
cost(hij ) is computed.
(2) Initialization: Every element in first row of the matrix
is determined, i.e.M1,j := w + cost(h1j ) · j for j ∈
[1 : m].
(3) Computation ofMk,j for k ∈ [2 : m], j ∈ [k : m]
according to
Mk,j = min{Mk−1,p−1 + w + cost(hp,j )
· (j − p − 1) | p ∈ [k : j ]}.
(4) Computation of the quality of the optimal solution from
the matrixM by determining
min{Mk,p | k ∈ [1 : m]}.
Run time analysis: The preprocessing step takes time
O(m2 · ) where is the cost of computing the set of all
minimal hypercontexts from two sets of hypercontexts in
the DAG that are predecessor of at least one hypercontext
in both sets and then to determine the cheapest of these hy-
percontexts as well. Initialization takes timeO(m · ) since
each element can be determined in timeO(). For step (3)
the algorithm considers an elementMk,j with k > 1 and
assumes that all elements in rowk − 1 have already been
determined. Then it is clear that the computation of a single
element in (3) takes time at most timeO(j). Step (iv) takes
time O(m). Hence we can derive the following theorem:
Theorem 2. The PHC-DAG problem can be solved in time
O(m3 +  · m2).
6. Polynomial time algorithm for PHC-Switch
In this section we describe a dynamic programming solu-
tion for the PHC-Switch problem. The algorithm computes
a tableM = (Mk,j )k∈[1:m],j∈[k:m] whereMk,j are the mini-
mal costs for the prefix of lengthj of the sequence of con-
text requirementsc1 . . . cm when usingk hypercontexts. The
optimal solution for PHC-Switch can then be derived from
this matrix. This algorithm is designed such that each row
of the matrix can be determined in timeO(n ·m) so that the
total run time is inO(n · m2).
In the following lethij be a cheapest hypercontext that
satisfies the contexts requirementsci, . . . , cj . First, we need
some facts and definitions. It is not hard to show for each
k ∈ [1 : m]: (i) the value ofMk,p is monotone increas-
ing in p, and (ii) for j ∈ [k : m] the value ofcost(hi,j )
is monotone decreasing ini. Let j ∈ [k : m]. It follows
from the stated facts that there exists a partitionT1, . . . , Th
of the sequence of context requirementsck . . . cj such that
ck . . . cj = T1 . . . Th and for each string of contextsTs , s ∈
[1 : h] holds: For all contextsct ∈ Ts the hypercontextsht,j
and therefore the costscost(ht,j ) are the same. Recall, that
ht,j for the PHC-Switch problem is defined as the hyper-
context that consists of all switches that are element of at
least one of the context requirementsct , . . . , cj , i.e.,ht,j =⋃j
i=t ci . We call the partitionT1, . . . , Th theequal cost par-
tition of [k : j ]. The corresponding intervals of indices of
the contexts the qual cost intervals.
Let [s : t] be an equal cost interval. For indexx ∈ [s : t]
the values ∈ [1 : n] are determined for whichMk,x−1 + ·
(t − (x − 1)) = min{Mk,y−1 +  · (t − (y− 1)) | y ∈ [s : t]}
holds. Clearly, for each index ∈ [s : t] the corresponding
 values form a subinterval of[1 : n]. This interval is called
the minimum cost interval of index x(within the equal cost
interval [s : t]) and is denoted byIx . It is not hard to show
that Is, . . . , It is a partition of[1 : n] where all elements in
Ii are smaller than all elements inIi+1 for i ∈ [s : t − 1].
In the following we describe the computation of a single
matrix element in the main step of the algorithm. We assume
that all elements in row 1 ofMk,j and all elementsMk,k =
k · w + ∑ki=1 |ci |, k ∈ [1 : m] have been computed during
initialization. It is enough to consider the computation of an
elementMk,j+1 for k > 1 andj ∈ [1 : m − 1] assuming
that elements in rowk − 1 and elementMk,j have already
been computed.
In order to search efficiently for possible good places to in-
troduce thekth hyperreconfiguration we introduce a pointer
structure over parts of the sequence of context requirements
c1 . . . cm. First we describe the pointer structure over the se-
quenceck . . . cj for the computation ofMk,j and then show
how it can be extended to a pointer structure over the se-
quenceck . . . cj+1 for the computation ofMk,j+1.
The first context requirements in each of the sequences
of context requirementsTh, . . . , T1 are linked by so called
equal cost pointers, i.e. there is a pointer to the first context
requirement inTh, from there to the first context require-
ment in Th−1 and so forth. Moreover, within each equal
cost interval the indicesx with a minimal cost interval that
is empty or contains only values that are smaller than the
actual costscost(hx,j+1) are linked in order of increasing
value by so-calledminimum cost pointers. In addition, there
is a pointer from the first context requirement of the interval
to the last useful index in the interval. This pointer is called
the end pointerof the equal cost interval. All indices with
an equal cost interval that are linked by minimal cost point-
ers are calleduseful. All other indices are calleduselessand
will be marked as useless by the algorithm. The following
two facts which are not hard to show are used for run time
analysis and to show the correctness of the algorithm.
Fact 1. It is easy to obtain the equal cost partitionU1 . . . Ug
of ck . . . cj+1 of [k : j + 1] and the corresponding pointers
S. Lange, M. Middendorf / J. Parallel Distrib. Comput. 65 (2005) 743–754 749
from the equal cost partitionT1, . . . , Th of [k : j ] and its
corresponding pointers in timeO(n).
To see that this is true observe that each string in
U1, . . . , Ug can be obtained by merging (or copying) neigh-
bored strings fromT1, . . . , Th andUg contains in addition
the context requirementcj+1.
Fact 2. Consider an elementTs of the equal cost partition
T1, . . . , Th of [k : j ]. Let cx (cy) be the context inTs (re-
spectively from the element of the equal cost partition of
[k : j + 1] that containsTs) for whichMk,x−1 + cost(hx,j )
(respectivelyMk,y−1 + cost(hy,j+1)) is minimal. Then it
follows thatxy.
To computeMk,j+1 the algorithm performs the following
steps:
(i) Extend the equal cost partition of[k : j ] by appending
the (preliminary) equal cost intervalcj+1 and let[1 : n] be
the (preliminary) minimal cost interval forj + 1.
(ii) Compute the equal cost partition of[k : j+1] from the
extended equal cost partition of[k : j ] by merging neigh-
bored intervals when they have the same cost with respect
to j + 1.
(iii) For each index within a merged interval the new equal
cost interval is determined together with its minimal cost
pointers and its end pointer. During this process all indices
that have become useless are marked.
Clearly step (i) can be done in timeO(1). The determi-
nation of the intervals that have the same costs in step (ii) is
done in timeO(n) by following pointers that connect the in-
tervals. To determine the time for step (iii) consider an equal
cost interval[s0 : sh], ks1shj + 1 that was merged
fromhn old intervals[s0 : s1], [s1+1 : s2], . . . , [sh−1+1 :
sh]. We now show that the computation of new pointers and
the marking of useless indices takes timeO(h + q) where
q is the number of marked indices.
(a) For every of theh intervals consider the minimum
cost interval of the index to which the first minimum cost
pointer points. If the minimum cost interval does not contain
a value that is at least as large ascost(hs,j+1) then the index
is marked as useless and the first pointer is merged with the
next pointer. This process proceeds until every first minimum
cost pointer points to a useful index.
(b) Now it remains to update the minimum cost intervals
by selecting for each cost value only the best index from
the h merged intervals. This can be done in a left to right
manner starting with the smaller cost values. Thereby al-
ways comparing the corresponding minimum cost intervals
of indices between two neighbored of themerged inter-
vals, say[si−1 + 1 : si] and[si + 1 : si+1], i ∈ [1 : h − 1].
For ease of description we assume here that all values in one
minimal cost interval are better than all values in the other
interval. If this is not the case both minimum cost intervals
are split so that each contains only the values for which it
is better. Observe that the split value can be computed in
constant time. When the minimum cost interval in the left
interval [si−1 + 1 : si] is better the corresponding index in
the right interval is marked useless and the next minimum
cost intervals are compared. When the minimum cost inter-
val in the right interval[si + 1 : si+1] is better the index in
the left interval is marked useless. Then the minimum cost
interval in the old right interval (now the new left interval)
is compared with the corresponding minimum cost interval
of its right neighbor interval[si+1 + 1 : si+2]. During the
search for the corresponding minimum cost interval all in-
dices that are passed are marked useless. The process stops
when the best minimum cost interval with valuen is found.
During the search a pointer is set from the rightmost use-
ful index of an interval to the first useful index in its right
neighbor. Thereby it might be necessary to jump over inter-
vals that have no useful index left. The end pointer of the
first interval is set to point to the last useful index of the
merged intervals.
Since the total number of intervals in the equal cost parti-
tion for [k : j + 1] is at mostn minus the number of merged
intervals the time to computeMk,j+1 is at mostO(n + q)
whereq is the total number of indices that are marked use-
less. Since at mostm − k indices exist in rowk of matrix
M it follows that the computation sum of all steps (iii) for
computing the elements in this row isO(n · m + m).
Theorem 3. The PHC-Switch problem can be solved in time
O(n · m2).
7. PHC with changeover costs
In this section we study a variant of the PHC problem
where the cost for a hyperreconfiguration depends not only
on the new hypercontext but also on the predecessor hyper-
context. Parts of the hyperreconfiguration costs can then be
considered as changeover costs and therefore we call this
problem the PHC problem with changeover costs. This prob-
lem is used to model architectures where during hyperrecon-
figuration it is not necessary to specify the new hypercon-
text from scratch but where it is possible to define the new
hypercontext through its difference to the old hypercontext.
For this problem the changeover costs between two hy-
percontexts are defined as the number of switches for which
the state has to be changed for the new hypercontext (i.e.,
the state is changed from available to not available or vice
versa). Formally, the problem is defined as follows.
PHC-Switch with changeover costsproblem: Given an
instance of the PHC-Switch problem, whereinit(h) = w
for h ∈ H, w > 0, the cost functionchangeoveron H ×
H is defined bychangeover(h1, h2) := |h1$h2| where$
denotes the symmetric difference, and an initial hypercontext
h0 ∈ H. Find a partition ofC into substringsS1, . . . , Sr ,
r1 (i.e.C = S1 . . . Sr ) and hypercontextsh1, . . . , hr such
that Si ⊂ hi(C) andr · w + ∑ri=1(|hi$hi+1| + |hi | · |Si |)
is minimized.
750 S. Lange, M. Middendorf / J. Parallel Distrib. Comput. 65 (2005) 743–754
The PHC-Switch with changeover costs problem is more
difficult to solve than the PHC-Switch problem because the
costs of a hypercontext depends on the predecessor hyper-
context. A consequence is that4 is not cost consistent for
this model and therefore it is not enough to consider only
minimal hypercontexts in the sense as defined in Section3.
In the following we describe a polynomial time dynamic pro-
gramming algorithm for the PHC-Switch with changeover
costs problem. Due to the cost model for hyperreconfigu-
ration it is not advantageous to remove and add a switchx
from/to the hypercontext when less than 3 contexts not us-
ing x are between the corresponding hyperreconfigurations.
Therefore, we assume that for each switchx a subsequence
of maximal length of context requirementsci, . . . , cj which
all do not usex has length at least 3. It is easy to guarantee
in time O(m · n) that this property holds. For ease of de-
scription we assume that at least 5 hyperreconfigurations are
done (including the hyperreconfiguration that is employed
before the first context).
The algorithm computes a tableM = (Mk,l,j,z), k ∈ [5 :
m − 1], l ∈ [k : m], z ∈ {0,1,2}, j < l − 1 for z = 0 and
j < l−2 for z ∈ {1,2} whereMk,l,j,z are the minimal costs
for the prefix of lengthl − 1 of the sequence of contexts
when usingk hypercontexts and
• for z = 0 a hyperreconfiguration is done atl and the last
hyperreconfiguration beforecl−1 is at j < l − 1,
• for z = 1 hyperreconfigurations are done atl ndl−1, and
the last hyperreconfiguration beforecl−2 is at j < l − 2,
• for z = 2 hyperreconfigurations are done atl, l − 1, l − 2
and the last hyperreconfiguration that is done beforecl−3
is at j < l − 2,
where in all three cases the minimal costs include only cost
w for the hyperreconfiguration atl but not the costs for re-
moving or adding switches from/to the hypercontext during
this hyperreconfiguration. The reason is that when comput-
ing Mk,l,j,z it is only decided that a hyperreconfiguration
is done atl but not how the hypercontext is defined. This
hypercontext is determined and the corresponding costs for
adding/removing switches are counted when thek+ 1th hy-
percontext is introduced or when the final costs for the re-
configurations up to the last context are determined.
It is easy to see that all valuesMk,l,j,z in the matrix with
k = 5 can be computed in timeO(m3 · n) similarly as the
values withk6 as it is described in the following: We
describe only the main step of the algorithm to compute
Mk,l,j,z when all valuesMk−1,.,.,. have already been com-
puted.
The algorithm needs a preprocessing step where for each
i ∈ [1 : m] a sorted list of all switches that are not used
at stepi is created. The list is called the list of non-used
switches ofi and is denotedlist(i). For each elementx ∈
list(i) the indexvi(x) of the first context after contextci
whenx is used again is computed. Also, the indexui(x) of
the last context wherex was used before contextci is com-
puted. Clearly this preprocessing step can be done in time
O(m · n). For the description of the main step of the algo-
rithm we make the following case differentiation depending
on the value ofz.
(1) Casez = 0: Mk,l,j,0 with j < l − 1 is the minimum
over all the following cost values:
(i) for eachM(k − 1, j, h,0) with h < j − 1 the cost
M(k − 1, j, h,0)
+ w,
+ the number of switchesx ∈ list(j − 1) that
have to be included additionally at thek − 1th
hyperreconfiguration atj because they are used
in a contextcl−1 or earlier (i.e.,vj−1(x) < l)
and that are not included in the hypercontext at
h (becauseuj−1(x) < h),
+ the number of switchesy ∈ list(j) that have
to be removed from the hypercontext atj be-
cause they are included in the hypercontext ath
(becausehuj (y)) and that are not used again
beforecl (i.e., lvj (y)),
+ the cost for reconfiguration of the contexts
cj , . . . , cl−1, i.e., the number of switches that
are (now) in the hypercontext atj imes (l− j ),
(ii) for eachM(k−1, j, h,1) and eachM(k−1, j, h,2)
with h < j − 2 the costM(k − 1, j, h,1) (respec-
tively, M(k − 1, j, h,2))
+ w,
+ the number of switchesx ∈ list(j −1) that have
to be included in the hypercontext additionally
at j with vj−1(x) < l anduj−1(x) < h (respec-
tively uj−1(x) < j − 2) (similar as in case (i)
but see the following remark),
+ the number of switchesy ∈ list(j) that have to
be removed from the hypercontext atj (because
vj (x) l) and are not included in the hypercon-
text atj − 1 (becauseuj (y) = j − 1),
+ the number of switches that are (now) in the
hypercontext atj times (l − j ).
Remark. Observe that forM(k − 1, j, h,1) (M(k −
1, j, h,2)) switchesx ∈ list(j−1) with huj−1(x) < j−1
(respectively,uj−1(x) = j−2) have been removed from the
hypercontext at the hyperreconfiguration atj − 1. When for
such a switchvj−1(x) < l then it has to be included again
during the hyperreconfiguration atj. But since it is cheaper
not to perform these two changes of the state of switchx we
assumex has not been removed at the hyperreconfiguration
at j − 1. Note, that this does not change the correct value
of M(k − 1, j, h,1) (respectively,M(k − 1, j, h,2)).
(2) Casez = 1:Mk,l,j,1 with j < l−2 is the minimum over
all the following cost values: forM(k − 1, l − 1, j,0)
the costM(k − 1, l − 1, j,0)
+ w,
+ the number of switchesx ∈ list(l − 2) that have to
be included additionally atl − 1 becausevl−2(x) =
l − 1) andul−2(x) < j (similar as in (1.i)),
S. Lange, M. Middendorf / J. Parallel Distrib. Comput. 65 (2005) 743–754 751
+ the number of switchesy ∈ list(l − 1) that have to
be removed from the hypercontext atl − 1 (because
vl−1(y) l andjul−1(y) (similar as in (1.i)),
+ the number of switches that are (now) in the hyper-
context atl − 1.
(3) Casez = 2:Mk,l,j,2 with j < l−2 is the minimum over
all the following cost values: forM(k − 1, l − 1, j,1)
or M(k − 1, l − 1, h,2) with h < l − 3 the costM(k −
1, l − 1, j,1) (respectively,M(k − 1, l − 1, h,2))
+ w,
+ the number of switchesx ∈ list(l − 2) that have to
be included additionally atl−1 becausevl−2(x) =
l − 1) andul−2(x) < j (respectively,ul−2(x) <
l − 3) (similar as in (1.i)) but see the following
remark),
+ the number of switchesy ∈ list(l − 1) that have to
be removed from the hypercontext atl−1 (because
vl−1(y) l andl−2 = ul−1(y) (similar as in (1.i)),
+ the number of switches that are (now) in the hyper-
context atl − 1.
Remark. Note that each switchx ∈ list(j − 1) with
vl−2(x) = l − 1 hasuj−1(x) < l − 4. Then observe that
switchesx ∈ list(j − 1) with juj−1(x) < l − 2 have
been removed from the hypercontext at the hyperreconfig-
uration atl − 2 (consideringM(k − 2, l − 2, j,0)). When
for such a switchvl−2(x) = l − 1 then it has to be included
again during the hyperreconfiguration atl − 1. But since it
is cheaper not to perform this two changes of the state of
switch x we assumex has no been removed at the hyper-
reconfiguration atl − 1 (respectively,l − 3). Note, that this
does not change the correct value ofM(k − 1, l − 1, j,1).
To find the optimal solution for the PHC-Switch prob-
lem with changeover costs from the table(Mk,l,j,z) one has
to compute for each element in the table the total costs
for hyperreconfigurations and reconfigurations. For element
Mk,l,j,z one has to determine which switches have to be re-
moved/included into the hypercontext atl (similarly as has
above). Then the number of this switches plusMk,l,j,z plus
the costs for the configurations of contextscl, . . . , cm are the
total costs. The minimum over all total costs gives the costs
of the optimal solution. It is not hard to determine also the
optimal solution using standard techniques for dynamic pro-
gramming. Since the computation of one elementMk,l,j,z
takes time at mostO(m2 · n), there areO(m2) elements in
the matrix, and the preprocessing takes timeO(m · n) we
have shown the following theorem:
Theorem 4. The PHC-Switch with changeover costs prob-
lem can be solved in timeO(m4 · n).
For the PHC-DAG with changeover costs problem we
assume that for each pair of hypercontextsh1, h2 changeover
costschangeover(h1, h2) are defined. Formally, the problem
is defined as follows.
PHC-DAG with changeover costsproblem: Given a hyper-
reconfigurable machine in the DAG-model whereinit(h) =
w for h ∈ H,w > 0, a nonnegative cost functionchangeover
on H ×H and a sequenceC = c1 . . . cm of context require-
ments. Find a partition ofC into substringsS1, . . . , Sr , r1
(i.e. C = S1 . . . Sr ) and hypercontextsh1, . . . , hr such that
Si ⊂ hi(C) andr ·w + ∑ri=1(changeover(hi, hi+1)+ |hi | ·|Si |) is minimized.
Theorem 5. The PHC-DAG problem with changeover costs
problem can be solved in timeO(m3 · |H|2).
The proof is omitted because the problem can be solved
with similar dynamic programming techniques as shown
above.
8. Experiments and results
In order to illustrate the concept of hyperreconfiguration
and to make a basic evaluation of the models and algorithms
presented above two differently grained hyperreconfigurable
architectures are considered in this section. As an example
of a simple model of a fine-grained dynamically reconfig-
urable machine we define the Simple HYperReconfigurable
Architecture (SHyRA) (see Fig.3). It consists of a set of re-
configurable Look-Up Tables (LUTs) each with three inputs
and one output. For storing signals a file of registers is used,
which are reconfigurably connected to the LUTs by a mul-
tiplexer (MUX) and a demultiplexer (DeMUX). The system
interfaces to the outside world through the content of the
register file. The execution of a computation on SHyRA is
cycle based. Each computational cycle consists of the fol-
lowing two steps: (i) Input stimuli are propagated from the
registers through the multiplexer to the corresponding in-
puts of the LUTs, (ii) a new output is generated in the LUTs
and stored through the demultiplexer into the register file.
Hyper-/Reconfiguration is possible before every computa-
tional cycle. Due to the architecture’s inability to directly
chain the LUTs for computation in one computational cycle
application processes are forced to make extensive use of
(hyper)reconfigurations. The Switch-model of hyperrecon-
figurable architectures fits to the SHyRA architecture be-
cause each reconfiguration bit for each of the three types of
reconfigurable resources can be viewed as a binary switch.
As a test application for the SHyRA we have chosen a
part of a simple control task consisting of a sequence of 5
computation phases with counter and adder circuits. After
mapping the design onto the reconfigurable resources (LUTs
and MUX/DeMUX) the sequence of context requirements
which is depicted in Fig. 4 was determined by analyzing the
change of the reconfiguration bits at each clock cycle.
For a more coarse-grained example a VLIW processor is
seen as a hyperreconfigurable machine where the functional
units are the reconfigurable resources. A reconfiguration step
takes place every clock cycle. Executing an algorithm on
























Fig. 3. Simple HYperReconfigurable Architecture: principal system design.
141



















Phase I Phase II Phase III Phase IV Phase V1
Fig. 4. Sequence of 141 context requirements for control task application
on SHyRA: For each context requirement the status (required/used or not
required/used) of each of 8 bits for all 19 LUTs is given.
a VLIW processor thus closely resembles a rapidly recon-
figuring system. Because of the high number of functional
units a VLIW code word rather frequently encodes several
idle operations for units currently not usable due to a lack
of parallelism in the algorithm. The concept of hyperrecon-
figurability can be employed to limit the effect of these idle
operations.
The TMS320C6201 signal processor from Texas Instru-
ments was used for the experiments. It features 4 types
of functional units (L = logic, M = multiply, D =
load/store, S = shift) with 2 units each totalling in 8
functional units. The accompanying C compiler produces
optimized and parallelized code and outputs the assembly
code as well, which was used for our simulations. As an
example algorithm a vector summation was implemented
in C, compiled into parallelized VLIW code and executed.
During execution the state (idle or busy) of each functional
unit was recorded for each clock cycle, yielding an 8-bit
vector for every clock cycle. The sequence of these vec-
tors for the VLIW processor can be seen as a sequence of




Fig. 5. Sequence of 62 context requirements for the vector summation
application on VLIW processor: For each context requirement the status
(required/used or not required/used) of each of 8 functional units is given.
context requirements (see Fig.5) for a hyperreconfigurable
machine that works in the Switch-model. The test run with a
vector size of 44 resulted in 62 executed VLIW instructions,
corresponding to 62 context requirements.
The obtained sequences of context requirements for the
fine-grained SHyRA and the coarse-grained VLIW can be
seen as instances of the PHC-Switch problem. For both prob-
lem instances the optimal times for hyperreconfigurations
and the corresponding hypercontexts were determined with
the algorithm from Section 6 for standard costs and with
the algorithm from Section 7 for changeover costs. Both test
instances were evaluated with different init costs for the hy-
perreconfiguration. For the standard model the costs were
init(h) = n+k wheren is the total number of bits (n = 144
for the SHyRA andn = 8 for the VLIW) andk ∈ [0,150]
can be seen as additional base costs. For the changeover cost
model init(h) = |h1$h2| + k whereh1, h2 are the context
requirements before and after the hyperreconfiguration and
k ∈ [0,150] can be seen as the base costs.
Fig. 6 shows the optimal total (hyper)reconfiguration costs
for different base costsk ∈ [0,150] for the control task on
SHyRA. The figure shows also the total number of hyper-
reconfigurations. It can be observed that the costs for the
PHC-Switch model with changeover costs are significantly
lower than in the standard PHC-Switch model for the same
base costs. The reason is that the changeover model can profit
from the fact that for this application consecutive hypercon-
texts are often similar. Therefore, the PHC-Switch model
with changeover costs can profit more from the hyperrecon-
figurations in the sense that it does more hyperreconfigura-
tions in the optimal solution. Especially, when the base costs
are low the number of hyperreconfigurations is much higher
as for the standard cost model.
The optimal costs and number of hyperreconfigurations
for vector summation on the VLIW are shown in Fig. 7.
Here again the standard PHC-Switch model produces sig-
nificantly higher costs when the base costs are small. But
different to the SHyRA application the number of hyper-
reconfigurations is often the same for both cost models. For
higher base costs the total hyperreconfiguration costs for
both cost models become very similar because in both cases
only one hypercontexts is used. Thus, the total hyperrecon-
figuration costs differ only by the cost for the first hyper-
reconfiguration (this isn + k for standard costs andn + k
minus the number of bits that are never used for changeover
costs).























































Fig. 6. Total (hyper)reconfiguration costs and number of hypercontexts






























































Fig. 7. Total (hyper)reconfiguration costs and number of hypercontexts
with changing hyperreconfiguration signaling costs for a vector summation
on a VLIW machine.
In order to evaluate the possible profit from hyperreconfig-
urations we compared the total (hyper)reconfiguration costs
for either of the two hyperreconfiguration models with the
case that hyperreconfigurations are not allowed. For the lat-
ter case it is assumed that all switches have to be fully
specified at each reconfiguration step. Fig.8 shows the rel-
ative total hyperreconfiguration costs for the hyperreconfig-
urable case (100% are the total reconfiguration costs when
hyperreconfigurations are not allowed). For the PHC-Switch
model with changeover costs and base costs zero the total
(hyper)reconfiguration costs are reduced to 36% for the con-
trol task and 53% for the vector summation example. Even
the standard cost model reduced the costs to 47% and, re-
spectively, 57% for base costs zero. Clearly, when the base
costs are increased the reduction decreases. In the case of












































Fig. 8. Ratio of total (hyper)reconfiguration costs for the hyperreconfig-
urable SHyRA and VLIW applications compared to the total reconfigu-
ration cost for the case that no hyperreconfigurations are allowed.
equally expensive to employ the concept of hyperreconfigu-
ration appears with base costs of 118 for the standard model
and 120 for the model with changeover costs. These are ex-
treme cases where the size of a hypercontext (and thus its
actual costs for specification) is 8 and thus the additionally
introduced base costs are 14.75 times larger. For the con-
trol task instance hyperreconfiguration reduced the costs by
more than 40% even with base costs 150. The results show
that there is a large potential for increasing the speed of dy-
namic reconfigurations when using hyperreconfigurations.
Of course, the cost model which is used here is only a sim-
ple example to illustrate our approach and it has be to fig-
ured out for real architectures how hyperreconfiguration can
exactly be realized.
9. Conclusion
The concept of hyperreconfigurable architectures was in-
troduced in this paper. This was motivated by the observa-
tion that for dynamically reconfigurable architectures or sys-
tems there is a tradeoff between flexibility and the amount
of reconfiguration information that has to be transferred
to the reconfigurable units. Hyperreconfigurable architec-
tures offer a new type of reconfiguration operations—called
hyperreconfigurations—that allow to alter their potential for
reconfiguration that is available during run time. A cen-
tral problem that emerges on hyperreconfigurable systems
is to determine for a given computation the best points in
time when a hyperreconfiguration should be performed. This
problem has been called the Partition into Hypercontexts
(PHC) problem. It was shown that the general PHC problem
is NP-complete. But for several models of hyperreconfig-
urable machines fast polynomial time algorithms have been
described to solve the PHC problem. As an illustrating ex-
ample and to evaluate the introduced concepts we presented
solutions of the PHC problem for example algorithms on
754 S. Lange, M. Middendorf / J. Parallel Distrib. Comput. 65 (2005) 743–754
a fine-grained and a coarse-grained reconfigurable architec-
ture. The concept of hyperreconfigurations offers interesting
questions for further research. For instance how can such ar-
chitectures be realized and what are the best cost measures?
How can the resource requirements be estimated? What spe-
cific problems occur for multi task hyperreconfigurable ar-
chitectures? While we have considered the latter question in
a companion paper[10] the other questions will be subject
to our future research.
References
[2] K. Bondalapati, V.K. Prasanna, Reconfigurable Computing:
Architectures, Models and Algorithms, in: Proceedings of the
Reconfigurable Architectures Workshop, 1997, .
[3] K. Compton, S. Hauck, Configurable computing: a survey of systems
and software, ACM Comput. Surveys 34 (2) (1994) 171–210.
[4] A. Dandalis, V.K. Prasanna, Configuration compression for FPGA-
based embedded systems, in: Proceedings of the ACM International
Symposium on Field-Programmable Gate Arrays, 2001, pp. 173–182.
[7] S. Hauck, Z. Li, J.D.P. Rolim, Configuration compression for the
Xilinx XC6200 FPGA, IEEE Trans. on CAD of Integrated Circuits
and Systems (8) (1999) 1107–1113.
[8] P. Kannan, S. Balachandran, D. Bhatia, On metrics for comparing
routability estimation methods for FPGAs, in: Proceedings of the
39th Design Automation Conference, 2002, pp. 70–75.
[9] M. Koester, J. Teich, (Self)-reconfigurable finite state machines:
theory and implementation, in: Proceedings of the 2002 Design,
Automation and Test in Europe, 2002, pp. 559–566.
[10] S. Lange, M. Middendorf, Multi task hyperreconfigurable
architectures: models and reconfiguration problems, Internat. J.
Embedded Systems, to appear (Preliminary version in Proceedings
of the 11th Reconfigurable Architectures Workshop, 2004).
[11] K.K. Lee, D.F. Wong, Incremental reconfiguration of multi-FPGA
systems, in: Proceedings of the 10th ACM International Symposium
on Field Programmable Gate Arrays, 2002, pp. 206–213.
[12] R.P.S. Sidhu, S. Wadhwa, A. Mei, V.K. Prasanna, A self-
reconfigurable gate array architecture, in: Proceedings of the FPL,
2000, pp. 106–120.
[16] S. Wadhwa, A. Dandalis, Efficient self-reconfigurable
implementations using on-chip memory, in: Proceedings of the FPL,
2000, pp. 443–448.
Martin Middendorf received the Diploma degree in Mathematics and a
Dr. rer. nat. at the University of Hannover, Germany, in 1988 and 1992,
respectively. He gained his professorial Habilitation in 1998 at the Uni-
versity of Karlsruhe, Germany. He has worked at the University of Dort-
mund, Germany, and the University of Hannover, Germany, as a visiting
Professor of Computer Science. He was Professor of Computer Science
at the Catholic University of Eichstätt, Germany. Currently he is profes-
sor for Parallel Computing and Complex Systems with the University of
Leipzig, Germany. His research interests include reconfigurable architec-
tures, parallel algorithms, algorithms from nature and bioinformatics.
Sebastian Lange received the Diploma degree in Computer Science
at the University of Leipzig, Germany, in 2003. He started his Ph.D.
at the Department of Computer Science, University of Leipzig, in the
same year. Currently he is working on Hyperreconfigurable Architectures
within the DFG Priority Programme Reconfigurable Computing Systems.
His research interests include reconfigurable architectures, fault-tolerant
design and bio-inspired algorithms.
