Dynamically reconfigurable architectures or systems are able to reconfigure their function and/or structure to suit the changing needs of a computation during run time. The increasing flexibility of modern dynamically reconfigurable systems improves their adaptability to computational needs but also makes fast reconfiguration difficult because of the large amount of reconfiguration information which has to be transferred. However, even when a computation uses this flexibility it will not use it all the time. Therefore, we propose to make the potential for reconfiguration itself reconfigurable. Such architectures are called hyperreconfigurable. Different models of hyperreconfigurable architectures are proposed in this paper. We also study a fundamental problem that emerges on such architectures, namely, to determine for a given computation when and how the potential for reconfiguration should be changed during run time so that the reconfiguration overhead is minimal. It is shown that the general problem is NP-hard but fast polynomial time algorithms are given to solve this problem for special types of hyperreconfigurable architectures. We define two example hyperreconfigurable architectures and illustrate the introduced concepts for corresponding application problems.
Introduction
Dynamically reconfigurable architectures or systems can adapt their function and/or structure to suit the changing needs of a computation during run time (e.g., [2, 3] ). A principle problem of dynamically reconfigurable systems is the tradeoff between flexibility and the amount of information needed for reconfiguration to define the new state of the system. Moreover, the increasingly higher integration of reconfigurable hardware, e.g. reconfigurable circuits on an FPGA chip, requires increased bandwidths for transferring the reconfiguration information. Modern FPGAs, for example, need several megabytes of reconfiguration data for a single reconfiguration step. This large amount of data transfer makes dynamic reconfigurations time critical operations, especially, for computations which exploit the full capacity of dynamically reconfigurable architectures by frequent reconfigurations.
Different approaches have been proposed in the literature to cope with this problem. Dandalis and Prasanna [4] have applied off-line compression methods to the stream of reconfiguration bits. The compressed stream of reconfiguration bits can be loaded faster onto the chip. Additional hardware is necessary on the chip which allows to decompress the reconfiguration bit stream during run time before it is needed to define the next configuration. Another method for compression of the reconfiguration bit stream which is suitable especially for the Xilinx XC6200 architecture has been described by Hauck et al. [7] . For Multi FPGA systems it has been proposed by Lee and Wong [11] to perform the reconfiguration incrementally so that only parts of the FPGAs need to be reconfigured at the same time. A third approach is to use self-reconfigurability which means that the reconfiguration bits are computed directly on the chip so that they can be transferred faster to the system units that are reconfigured (see Köster and Teich [9] , Sidhu et al. [12] , Wadhwa and Dandalis [16] ). All these approaches have in common that they do not change the reconfiguration information itself.
In this paper, we propose a new approach to make run time reconfiguration faster by defining a new type of reconfigurable architectures. We use the fact that algorithms or computations typically consist of different phases where during each phase only a fraction of the reconfiguration potential of the underlying architecture is needed. The idea is to make the reconfiguration potential itself reconfigurable. The smaller the actual reconfiguration potential of an architecture is the smaller will the amount of reconfiguration information be that has to be transferred during reconfiguration and the faster will a reconfiguration step be. We call such architectures hyperreconfigurable architectures.
Hyperreconfigurable architectures use two types of reconfiguration steps: (i) reconfiguration steps where the reconfiguration potential of the architecture is defined (ii) standard reconfiguration steps which are used to reconfigure the hardware according to the contexts demanded by the algorithm. The first type of reconfiguration steps are called hyperreconfiguration steps.
A central problem that emerges on hyperreconfigurable architectures is to determine when hyperreconfiguration steps should be taken and how the reconfiguration potential should be defined in these steps in order to minimize the total time necessary for (hyper)reconfiguration of a computation. We call this problem Partition into Hypercontexts (PHC) problem and show that it is NP-hard. We also describe polynomial time algorithms for several variants of PHC on the so called Switch model and DAG model of hyperreconfigurable architectures. To illustrate the ideas in this paper we consider examples of two differently grained hyperreconfigurable architectures. For each of these architectures we study an instance of the PHC problem and give optimal solutions that were derived with the presented algorithms.
The paper is organized as follows. In Section 2, we describe the concept of hyperreconfigurable architectures. Formal models for such architectures and the Partition into Hypercontexts (PHC) problem are defined in Section 3. The NP-Hardness of PHC is shown in Section 4. In Sections 5 and 6 we discuss polynomial time solvable cases of the PHC problem. A variant of the PHC problem with changeover costs is studied in Section 7. Experimental results for the example architectures are presented in Section 8. The paper ends with a conclusion in Section 9.
The concept of hyperreconfigurable architectures
We call dynamically reconfigurable architectures and systems which allow to alter the reconfiguration potential during run time hyperreconfigurable architectures. Hyperreconfigurable architectures have two types of reconfiguration steps. The (ordinary) reconfiguration steps are used to actually define a new configuration of the system. The state of the system that can be changed by reconfiguration is called the context of a computation. Hyperreconfiguration steps are used for defining the actual reconfiguration potential of the architecture that is available for the ordinary reconfiguration steps. Thus, a hyperreconfiguration step defines the set of contexts that is available for the (ordinary) reconfiguration steps. Such a set of available contexts is called a hypercontext. With "available" we assign those reconfigurable resources that are activated by the hypercontext and therefore are available for reconfiguration. If a reconfiguration needs resources that are not included in the hypercontext they have to be activated/included by a hyperreconfiguration. We assume that a reconfiguration step requires reconfiguration information for all activated resources (even when the information is that an activated resource is not used in the corresponding context). Thus, we are interested in the case where the cost (e.g., the time or the amount of bits necessary to be loaded onto the architecture) of a reconfiguration step depends on the current hypercontext. Formal models for hyperreconfigurable architectures will be discussed in the next section.
This concept is illustrated in the following example. Consider a switch box for an FPGA where the state of each switch is determined by the content of a corresponding SRAM-cell as depicted on the right-hand side of Fig. 1 . The content of these SRAM-cells is the current context of the switch box.
During reconfiguration the SRAM-cells are chained sequentially to form a shift register shifting in one bit of the new context and shifting out one bit of the old context at each time step. In order to enable hyperreconfigurability a second chain of SRAM cells is introduced to store the hypercontext of the switch box. Each of these cells manipulates two switches-one in front and one behind its corresponding (context)-SRAM-cell. These switches control whether the SRAM-cell is part of the switch register and the reconfiguration bits are sent through the SRAM-cell during reconfiguration or bypass it, thereby excluding it from reconfiguration. In the example of Fig. 1 two of the three shown hypercontext SRAM cells contain a 1 which means the corresponding context SRAM cells are included in the current chain of context SRAM cells. The other hypercontext SRAM cell contains a 0 which means the corresponding context SRAM cell is bypassed and therefore not included in the current chain of context SRAM cells. Since the time for reconfiguration is determined by the number of bits that have to be shifted in, it depends directly on the hypercontext. Loading a new hypercontext is done analogously to reconfiguration: the (hypercontext) SRAM-cells for a shift register and the bits of the new hypercontext are shifted in. Since the number of bits in the hypercontext is always the same, the time for a hyperreconfiguration step does not change. In this ex- ample it equals the time for a reconfiguration step where all SRAM-cells are included in the hypercontext. Often it will not be possible to determine the context requirements of an algorithm exactly in advance. This is typically the case when a context depends on data that are computed at run time. For the concept of reconfigurable architectures it is enough when an upper bound on the requirements that will actually be needed during run time can be given. For example, it might be possible to know in advance that the routing requirements will be low during a certain phase of the algorithm even when the exact routing is not known in advance. Note, that several methods for resource estimation on reconfigurable architectures have appeared in the literature (e.g., see [8] ).
Formal models and the partition into hypercontexts problem
In this section we introduce formal models for hyperreconfigurable architectures (or machines). The models which we introduce are general models that allow us to consider general algorithmic aspects for such architectures (and in particular the PHC problem). For concrete architectures these models can be made more specific.
We assume that an algorithm or a computation is characterized by a sequence of context requirements. Each context requirement describes the resource requirements that the algorithm/computation will have for a corresponding reconfiguration step that is performed during the run of the algorithm/computation. Hence, the number of context requirements equals the number of reconfiguration steps. Formally, let C be the set of possible context requirements for a reconfigurable machine. Then an algorithm/computation is characterized by the sequence C = c 1 . . . c m of its context requirements during run time. Since the actual reconfiguration steps might depend on data that is only available at run time a context requirement always specifies the (estimated) maximal set of resources that could possibly be needed. When the meaning is clear we call the context requirements of an algorithm/computation sometimes simply its contexts. The reason is that each context requirement corresponds to exactly one new context that is reconfigured during the run of the algorithm.
A reconfiguration into a new context can in general only be realized during run time when the machine is in a hypercontext that contains at least all contexts possible according to the corresponding context requirement. In this case a hypercontext satisfies the corresponding context requirement. Formally, a hypercontext is a state of the reconfigurable machine which is characterized by the subset of C context requirements that are satisfied when the machine is in this state. At any time exactly one hypercontext is realized on the machine. Let H be the set of possible hypercontexts. 
In the example hyperreconfigurable switch box (see Section 2) the set of hypercontexts H can be the set of all subsets of switches. A context requirement c can be a subset of the switches. Then for a hypercontext h ∈ H relation c ∈ h(C) holds if c ⊂ h, i.e., all switches that are required for c are in the hypercontext h. The set of context requirements will usually depend not only on the architecture but also on the application and how good algorithm needs can be analyzed. For the switch box example the set of context requirements C can be the set of all subsets of switches. Then H = C. But it is also possible that C contains only a few subsets of switches, e.g., only the set of all switches and the set of all switches on the diagonal of the switch box.
In order to change the machine's current hypercontext a hyperreconfiguration step is necessary. To measure the costs for hyperreconfiguration steps and reconfiguration steps we introduce for each hypercontext h ∈ H two cost measures: (i) init(h) is the cost of performing a hyperreconfiguration that brings the machine into hypercontext h (ii) cost(h) denotes the cost of an ordinary reconfiguration step when the machine is in hypercontext h. In the example hyperreconfigurable switch box (see Section 2) init(h) is the time to load the hypercontext bits into the hypercontext SRAM cells and cost(h) is the time to load the context bits into the chain of context SRAM cells. Note that for this example cost(h) depends on the current number of SRAM cells that are in the chain and are not bypassed.
A computation is characterized by a partition of C into substrings S 1 , . . . , S r (i.e. C = S 1 . . . S r ) and hypercontexts
are the costs where |S i | is the length of S i , i.e., the number of context requirements in S i . When the algorithm/computation is executed the machine performs the following reconfiguration operations: h 1 S 1 . . . h r S r where S i stands for a sequence of |S i | reconfigurations which use only those parts of the machine which are available within the hypercontext h i . It is assumed that a hyperreconfiguration is always performed before the first reconfiguration step. An important problem that emerges for a hyperreconfigurable machine and a given algorithm (i.e. a sequence of context requirements) is to define when hyperreconfigurations are done and how corresponding hypercontexts are defined such that the context requirements of the algorithm are satisfied and the total costs for the hyperreconfiguration steps and the ordinary reconfiguration steps are minimized. The PHC problem can be defined as follows.
Partition into Hypercontexts (PHC) problem
Given a hyperreconfigurable machine (as described above) and a sequence C = c 1 . . . c m of context require- Two variants of the model for hyperreconfigurable architectures are introduced. The DAG-model is for coarsegrained reconfigurable machines where different reconfigurable submachines (hypercontexts) can be defined (through hyperreconfiguration) that can be ordered with respect to their computational power. It is assumed that a submachine which is an extension of another submachine and which has therefore a larger computational power produces more reconfiguration costs. This problem is typically used to model coarse-grained reconfigurable machines.
A directed acyclic graph (DAG) describes the precedence relation between the hypercontexts. Formally, given a DAG
It is assumed that a hypercontext h exists that satisfies all possible context requirements, i.e. h(C) = C. In addition let cost(h) > 0 and init(h) = w for each h ∈ H and a constant k 0 such that for each edge
. Then a computation is characterized by a partition of C into substrings S 1 , . . . , S r , r 1 (i.e. C = S 1 . . . S r ) and hypercontexts h 1 , . . . , h r such that S i ⊂ h i (C) and the total (hyper)reconfiguration costs are
For each context requirement c ∈ C let c(H) be the set of minimal (with respect to the precedence relation defined by E) hypercontexts h in the DAG which satisfy c ∈ h(C). The PHC problem for the DAG-model can be defined as follows.
PHC-DAG problem
Given a hyperreconfigurable machine in the DAG-model and a sequence C = c 1 . . . c m of context requirements. Find a partition of C into substrings S 1 , . . . , S r , r 1 (i.e. C = S 1 . . . S r ) and hypercontexts h 1 , . . . , h r such that S i ⊂ h i (C) and the total (hyper)reconfiguration costs are minimal.
The second variant of a hyperreconfigurable machine is called Switch-model. In contrast to the DAG-model which is more confined to coarse-grained hyperreconfigurable machines the Switch-model is also well suited for fine-grained hyperreconfigurable machines. Here we assume that there exists a set of small (similar) reconfigurable units and every subset of these units can be used to define the reconfigurable machine that is available during a hypercontext. For example, each unit might be a switch and the set of switches defines the available part of the reconfigurable machine. An example are hyperreconfigurable switch boxes (see Section 2) of an FPGA which are used for connecting the functional units. The larger the (possible) routing requirements of an algorithm for a context, the more switches should be available in the hypercontext for reconfiguration during run time. For reconfiguration the state of each available switch has to be defined. Thus the cost for reconfiguration is just the number of available units plus some overhead cost. Hence, for this Switch model the reconfigurable machine that is available during a hypercontext is defined by the subset of available units.
Formally, let X = {x 1 , . . . , x n } be a set of switches and define C = H = 2 X , i.e., the set of possible context requirements C and the set of possible hypercontexts H equal the set of all subsets of X. For context x ∈ X the relation x ∈ h(C) holds, when x ⊂ h. Let cost(h) = |h|, where |h| is the size of h, i.e., the number of switches available in h. Let init(h) = n for h ∈ H, which reflects the fact that for each switch it has to be defined during hyperreconfiguration whether it is available in the new hypercontext. A computation is characterized by a partition of C into substrings S 1 , . . . , S r , r 1 (i.e., C = S 1 . . . S r ) and hypercontexts h 1 , . . . , h r such that S i ⊂ h i (C) and the total (hyper)reconfiguration costs are
The PHC problem for the Switch-model can be defined as follows.
PHC-Switch problem
Given a hyperreconfigurable machine in the Switch-model with set of switches X = {x 1 , . . . , x n } and a sequence of context requirements C = c 1 . . . c m . Find a partition of C into substrings S 1 , . . . , S r , r 1 (i.e. C = S 1 . . . S r ) and hypercontexts h 1 , . . . , h r such that S i ⊂ h i (C) and the total (hyper)reconfiguration costs are minimal.
Note that for the PHC-Switch problem there exist 2 n hypercontexts but this number is not part of the size of the problem instance which is n + m. This is different for the PHC-DAG where the DAG is part of the instance and therefore the number of possible hypercontexts is also part of the instance.
In order to solve the PHC problem we make a simple but useful observation. A partial order 4 on the set of hypercontexts (i.e., 4 is a reflexive, antisymmetric, and transitive relation on H) can be defined naturally by the subset relation on the sets of contexts that are satisfied by the hypercontexts. Thus, 
there exists no hypercontext h ∈ H, h = h i with S i ⊂ h(C) and h ≺ h i ).
It is easy to see that relation 4 is cost consistent for the DAG-model and the Switch-model.
NP-Hardness
In this section we state that the general PHC problem is NP-complete, which means it is unlikely that the problem can be solved in polynomial time.
Theorem 1. The PHC problem is NP-complete.
We only give the proof idea because the proof itself is somewhat technical and uses standard constructions for proving NP-completeness. For a proof one can encode an instance of an NP-hard problem, say 3-SAT, in a sequence of contexts C. Then a cost function and a set of hypercontexts can be defined such that there exists a cheap partition into hypercontexts of C if and only if the partition consists of a single hypercontext and the contexts in C encode an instance of 3-SAT that is solvable such that there exists no partition of C into substrings which can be covered by hypercontexts in a cheap way.
Polynomial time algorithm for PHC-DAG
The algorithm for the PHC-DAG problem which is described in this section is a dynamic programming algorithm that computes a table
are the minimal costs for the prefix of length j of the sequence of context requirements c 1 . . . c m when using k hypercontexts. The optimal solution for the PHC-DAG model can then be derived with standard dynamic programming techniques from this matrix.
In the following let h ij be a cheapest hypercontext that satisfies the contexts requirements c i , . . . , c j . The algorithm for the PHC-DAG problem consists of the following steps:
(
according to
(4) Computation of the quality of the optimal solution from the matrix M by determining
Run time analysis:
The preprocessing step takes time O(m 2 · ) where is the cost of computing the set of all minimal hypercontexts from two sets of hypercontexts in the DAG that are predecessor of at least one hypercontext in both sets and then to determine the cheapest of these hypercontexts as well. 
Polynomial time algorithm for PHC-Switch
In this section we describe a dynamic programming solution for the PHC-Switch problem. The algorithm computes
where M k,j are the minimal costs for the prefix of length j of the sequence of context requirements c 1 . . . c m when using k hypercontexts. The optimal solution for PHC-Switch can then be derived from this matrix. This algorithm is designed such that each row of the matrix can be determined in time O(n · m) so that the total run time is in O(n · m 2 ).
In In the following we describe the computation of a single matrix element in the main step of the algorithm. We assume that all elements in row 1 of M k,j and all elements In order to search efficiently for possible good places to introduce the kth hyperreconfiguration we introduce a pointer structure over parts of the sequence of context requirements c 1 . . . c m . First we describe the pointer structure over the sequence c k . . . c j for the computation of M k,j and then show how it can be extended to a pointer structure over the sequence c k . . . c j +1 for the computation of M k,j +1 .
The first context requirements in each of the sequences of context requirements T h , . . . , T 1 are linked by so called equal cost pointers, i.e. there is a pointer to the first context requirement in T h , from there to the first context requirement in T h−1 and so forth. Moreover, within each equal cost interval the indices x with a minimal cost interval that is empty or contains only values that are smaller than the actual costs cost(h x,j +1 ) are linked in order of increasing value by so-called minimum cost pointers. In addition, there is a pointer from the first context requirement of the interval to the last useful index in the interval. This pointer is called the end pointer of the equal cost interval. All indices with an equal cost interval that are linked by minimal cost pointers are called useful. All other indices are called useless and will be marked as useless by the algorithm. The following two facts which are not hard to show are used for run time analysis and to show the correctness of the algorithm. 
. , T h of [k : j ] and its corresponding pointers in time O(n).
To see that this is true observe that each string in U 1 , . . . , U g can be obtained by merging (or copying) neighbored strings from T 1 , . . . , T h and U g contains in addition the context requirement c j +1 . (a) For every of the h intervals consider the minimum cost interval of the index to which the first minimum cost pointer points. If the minimum cost interval does not contain a value that is at least as large as cost(h s,j +1 ) then the index is marked as useless and the first pointer is merged with the next pointer. This process proceeds until every first minimum cost pointer points to a useful index.
(b) Now it remains to update the minimum cost intervals by selecting for each cost value only the best index from the h merged intervals. This can be done in a left to right manner starting with the smaller cost values. ] . During the search for the corresponding minimum cost interval all indices that are passed are marked useless. The process stops when the best minimum cost interval with value n is found. During the search a pointer is set from the rightmost useful index of an interval to the first useful index in its right neighbor. Thereby it might be necessary to jump over intervals that have no useful index left. The end pointer of the first interval is set to point to the last useful index of the merged intervals.
Since the total number of intervals in the equal cost partition for [k : j + 1] is at most n minus the number of merged intervals the time to compute M k,j +1 is at most O(n + q) where q is the total number of indices that are marked useless. Since at most m − k indices exist in row k of matrix M it follows that the computation sum of all steps (iii) for computing the elements in this row is O(n · m + m).
Theorem 3. The PHC-Switch problem can be solved in time
O(n · m 2 ).
PHC with changeover costs
In this section we study a variant of the PHC problem where the cost for a hyperreconfiguration depends not only on the new hypercontext but also on the predecessor hypercontext. Parts of the hyperreconfiguration costs can then be considered as changeover costs and therefore we call this problem the PHC problem with changeover costs. This problem is used to model architectures where during hyperreconfiguration it is not necessary to specify the new hypercontext from scratch but where it is possible to define the new hypercontext through its difference to the old hypercontext.
For this problem the changeover costs between two hypercontexts are defined as the number of switches for which the state has to be changed for the new hypercontext (i.e., the state is changed from available to not available or vice versa). Formally, the problem is defined as follows.
PHC-Switch with changeover costs problem: Given an instance of the PHC-Switch problem, where init(h) = w for h ∈ H, w > 0, the cost function changeover on H × H is defined by changeover(h 1 , h 2 ) := |h 1 $h 2 | where $ denotes the symmetric difference, and an initial hypercontext h 0 ∈ H. Find a partition of C into substrings S 1 , . . . , S r , r 1 (i.e. C = S 1 . . . S r ) and hypercontexts h 1 , . . . , h r such that
The PHC-Switch with changeover costs problem is more difficult to solve than the PHC-Switch problem because the costs of a hypercontext depends on the predecessor hypercontext. A consequence is that 4 is not cost consistent for this model and therefore it is not enough to consider only minimal hypercontexts in the sense as defined in Section 3. In the following we describe a polynomial time dynamic programming algorithm for the PHC-Switch with changeover costs problem. Due to the cost model for hyperreconfiguration it is not advantageous to remove and add a switch x from/to the hypercontext when less than 3 contexts not using x are between the corresponding hyperreconfigurations. Therefore, we assume that for each switch x a subsequence of maximal length of context requirements c i , . . . , c j which all do not use x has length at least 3. It is easy to guarantee in time O(m · n) that this property holds. For ease of description we assume that at least 5 hyperreconfigurations are done (including the hyperreconfiguration that is employed before the first context).
The algorithm computes a table
, z ∈ {0, 1, 2}, j < l − 1 for z = 0 and j < l − 2 for z ∈ {1, 2} where M k,l,j,z are the minimal costs for the prefix of length l − 1 of the sequence of contexts when using k hypercontexts and
• for z = 0 a hyperreconfiguration is done at l and the last hyperreconfiguration before c l−1 is at j < l − 1, • for z = 1 hyperreconfigurations are done at l and l−1, and the last hyperreconfiguration before c l−2 is at j < l − 2, • for z = 2 hyperreconfigurations are done at l, l − 1, l − 2 and the last hyperreconfiguration that is done before c l−3 is at j < l − 2, where in all three cases the minimal costs include only cost w for the hyperreconfiguration at l but not the costs for removing or adding switches from/to the hypercontext during this hyperreconfiguration. The reason is that when computing M k,l,j,z it is only decided that a hyperreconfiguration is done at l but not how the hypercontext is defined. This hypercontext is determined and the corresponding costs for adding/removing switches are counted when the k + 1th hypercontext is introduced or when the final costs for the reconfigurations up to the last context are determined. It is easy to see that all values M k,l,j,z in the matrix with k = 5 can be computed in time O(m 3 · n) similarly as the values with k 6 as it is described in the following: We describe only the main step of the algorithm to compute M k,l,j,z when all values M k−1,.,.,. have already been computed.
The algorithm needs a preprocessing step where for each i ∈ [1 : m] a sorted list of all switches that are not used at step i is created. The list is called the list of non-used switches of i and is denoted list(i). For each element x ∈ list(i) the index v i (x) of the first context after context c i when x is used again is computed. Also, the index u i (x) of the last context where x was used before context c i is computed. Clearly this preprocessing step can be done in time O(m · n). For the description of the main step of the algorithm we make the following case differentiation depending on the value of z.
(1) Case z = 0: M k,l,j,0 with j < l − 1 is the minimum over all the following cost values:
+ the number of switches x ∈ list(j − 1) that have to be included additionally at the k − 1th hyperreconfiguration at j because they are used in a context c l−1 or earlier (i.e., v j −1 (x) < l) and that are not included in the hypercontext at h (because u j −1 (x) < h), + the number of switches y ∈ list(j ) that have to be removed from the hypercontext at j because they are included in the hypercontext at h (because h u j (y)) and that are not used again before c l (i.e., l v j (y)), + the cost for reconfiguration of the contexts c j , . . . , c l−1 , i.e., the number of switches that are (now) in the hypercontext at j times (l − j ),
+ the number of switches x ∈ list(j − 1) that have to be included in the hypercontext additionally at j with v j −1 (x) < l and u j −1 (x) < h (respectively u j −1 (x) < j − 2) (similar as in case (i) but see the following remark), + the number of switches y ∈ list(j ) that have to be removed from the hypercontext at j (because v j (x) l) and are not included in the hypercontext at j − 1 (because u j (y) = j − 1), + the number of switches that are (now) in the hypercontext at j times (l − j ).
Remark. Observe that for
have been removed from the hypercontext at the hyperreconfiguration at j − 1. When for such a switch v j −1 (x) < l then it has to be included again during the hyperreconfiguration at j. But since it is cheaper not to perform these two changes of the state of switch x we assume x has not been removed at the hyperreconfiguration at j − 1. Note, that this does not change the correct value of M(k − 1, j, h, 1) (respectively, M(k − 1, j, h, 2)). 
+ the number of switches x ∈ list(l − 2) that have to be included additionally at l − 1 because v l−2 (x) = l − 1) and u l−2 (x) < j (respectively, u l−2 (x) < l − 3) (similar as in (1.i)) but see the following remark), + the number of switches y ∈ list(l − 1) that have to be removed from the hypercontext at l − 1 (because v l−1 (y) l and l − 2 = u l−1 (y) (similar as in (1.i)), + the number of switches that are (now) in the hypercontext at l − 1.
Remark. Note that each switch x ∈ list(j − 1) with v l−2 (x) = l − 1 has u j −1 (x) < l − 4. Then observe that switches x ∈ list(j − 1) with j u j −1 (x) < l − 2 have been removed from the hypercontext at the hyperreconfig-
. When for such a switch v l−2 (x) = l − 1 then it has to be included again during the hyperreconfiguration at l − 1. But since it is cheaper not to perform this two changes of the state of switch x we assume x has no been removed at the hyperreconfiguration at l − 1 (respectively, l − 3). Note, that this does not change the correct value of M(k − 1, l − 1, j, 1).
To find the optimal solution for the PHC-Switch problem with changeover costs from the table (M k,l,j,z ) one has to compute for each element in the table the total costs for hyperreconfigurations and reconfigurations. For element M k,l,j,z one has to determine which switches have to be removed/included into the hypercontext at l (similarly as has above). Then the number of this switches plus M k,l,j,z plus the costs for the configurations of contexts c l , . . . , c m are the total costs. The minimum over all total costs gives the costs of the optimal solution. It is not hard to determine also the optimal solution using standard techniques for dynamic pro- For the PHC-DAG with changeover costs problem we assume that for each pair of hypercontexts h 1 , h 2 changeover costs changeover(h 1 , h 2 ) are defined. Formally, the problem is defined as follows. The proof is omitted because the problem can be solved with similar dynamic programming techniques as shown above.
PHC-DAG with changeover costs

Experiments and results
In order to illustrate the concept of hyperreconfiguration and to make a basic evaluation of the models and algorithms presented above two differently grained hyperreconfigurable architectures are considered in this section. As an example of a simple model of a fine-grained dynamically reconfigurable machine we define the Simple HYperReconfigurable Architecture (SHyRA) (see Fig. 3 ). It consists of a set of reconfigurable Look-Up Tables (LUTs) each with three inputs and one output. For storing signals a file of registers is used, which are reconfigurably connected to the LUTs by a multiplexer (MUX) and a demultiplexer (DeMUX). The system interfaces to the outside world through the content of the register file. The execution of a computation on SHyRA is cycle based. Each computational cycle consists of the following two steps: (i) Input stimuli are propagated from the registers through the multiplexer to the corresponding inputs of the LUTs, (ii) a new output is generated in the LUTs and stored through the demultiplexer into the register file. Hyper-/Reconfiguration is possible before every computational cycle. Due to the architecture's inability to directly chain the LUTs for computation in one computational cycle application processes are forced to make extensive use of (hyper)reconfigurations. The Switch-model of hyperreconfigurable architectures fits to the SHyRA architecture because each reconfiguration bit for each of the three types of reconfigurable resources can be viewed as a binary switch.
As a test application for the SHyRA we have chosen a part of a simple control task consisting of a sequence of 5 computation phases with counter and adder circuits. After mapping the design onto the reconfigurable resources (LUTs and MUX/DeMUX) the sequence of context requirements which is depicted in Fig. 4 was determined by analyzing the change of the reconfiguration bits at each clock cycle.
For a more coarse-grained example a VLIW processor is seen as a hyperreconfigurable machine where the functional units are the reconfigurable resources. A reconfiguration step takes place every clock cycle. Executing an algorithm on LUT1  LUT2  LUT3  LUT4  LUT5  LUT6  LUT7  LUT8  LUT9  LUT10  LUT11  LUT12  LUT13  LUT14  LUT15  LUT16  LUT17 a VLIW processor thus closely resembles a rapidly reconfiguring system. Because of the high number of functional units a VLIW code word rather frequently encodes several idle operations for units currently not usable due to a lack of parallelism in the algorithm. The concept of hyperreconfigurability can be employed to limit the effect of these idle operations. The TMS320C6201 signal processor from Texas Instruments was used for the experiments. It features 4 types of functional units (L = logic, M = multiply, D = load/store, S = shift) with 2 units each totalling in 8 functional units. The accompanying C compiler produces optimized and parallelized code and outputs the assembly code as well, which was used for our simulations. As an example algorithm a vector summation was implemented in C, compiled into parallelized VLIW code and executed. During execution the state (idle or busy) of each functional unit was recorded for each clock cycle, yielding an 8-bit vector for every clock cycle. The sequence of these vectors for the VLIW processor can be seen as a sequence of context requirements (see Fig. 5 ) for a hyperreconfigurable machine that works in the Switch-model. The test run with a vector size of 44 resulted in 62 executed VLIW instructions, corresponding to 62 context requirements. The obtained sequences of context requirements for the fine-grained SHyRA and the coarse-grained VLIW can be seen as instances of the PHC-Switch problem. For both problem instances the optimal times for hyperreconfigurations and the corresponding hypercontexts were determined with the algorithm from Section 6 for standard costs and with the algorithm from Section 7 for changeover costs. Both test instances were evaluated with different init costs for the hyperreconfiguration. For the standard model the costs were init(h) = n + k where n is the total number of bits (n = 144 for the SHyRA and n = 8 for the VLIW) and k ∈ [0, 150] can be seen as additional base costs. For the changeover cost model init(h) = |h 1 $h 2 | + k where h 1 , h 2 are the context requirements before and after the hyperreconfiguration and k ∈ [0, 150] can be seen as the base costs. Fig. 6 shows the optimal total (hyper)reconfiguration costs for different base costs k ∈ [0, 150] for the control task on SHyRA. The figure shows also the total number of hyperreconfigurations. It can be observed that the costs for the PHC-Switch model with changeover costs are significantly lower than in the standard PHC-Switch model for the same base costs. The reason is that the changeover model can profit from the fact that for this application consecutive hypercontexts are often similar. Therefore, the PHC-Switch model with changeover costs can profit more from the hyperreconfigurations in the sense that it does more hyperreconfigurations in the optimal solution. Especially, when the base costs are low the number of hyperreconfigurations is much higher as for the standard cost model.
The optimal costs and number of hyperreconfigurations for vector summation on the VLIW are shown in Fig. 7 . Here again the standard PHC-Switch model produces significantly higher costs when the base costs are small. But different to the SHyRA application the number of hyperreconfigurations is often the same for both cost models. For higher base costs the total hyperreconfiguration costs for both cost models become very similar because in both cases only one hypercontexts is used. Thus, the total hyperreconfiguration costs differ only by the cost for the first hyperreconfiguration (this is n + k for standard costs and n + k minus the number of bits that are never used for changeover costs). In order to evaluate the possible profit from hyperreconfigurations we compared the total (hyper)reconfiguration costs for either of the two hyperreconfiguration models with the case that hyperreconfigurations are not allowed. For the latter case it is assumed that all switches have to be fully specified at each reconfiguration step. Fig. 8 shows the relative total hyperreconfiguration costs for the hyperreconfigurable case (100% are the total reconfiguration costs when hyperreconfigurations are not allowed). For the PHC-Switch model with changeover costs and base costs zero the total (hyper)reconfiguration costs are reduced to 36% for the control task and 53% for the vector summation example. Even the standard cost model reduced the costs to 47% and, respectively, 57% for base costs zero. Clearly, when the base costs are increased the reduction decreases. In the case of the vector summation the break-even point where it becomes equally expensive to employ the concept of hyperreconfiguration appears with base costs of 118 for the standard model and 120 for the model with changeover costs. These are extreme cases where the size of a hypercontext (and thus its actual costs for specification) is 8 and thus the additionally introduced base costs are 14.75 times larger. For the control task instance hyperreconfiguration reduced the costs by more than 40% even with base costs 150. The results show that there is a large potential for increasing the speed of dynamic reconfigurations when using hyperreconfigurations. Of course, the cost model which is used here is only a simple example to illustrate our approach and it has be to figured out for real architectures how hyperreconfiguration can exactly be realized.
Conclusion
The concept of hyperreconfigurable architectures was introduced in this paper. This was motivated by the observation that for dynamically reconfigurable architectures or systems there is a tradeoff between flexibility and the amount of reconfiguration information that has to be transferred to the reconfigurable units. Hyperreconfigurable architectures offer a new type of reconfiguration operations-called hyperreconfigurations-that allow to alter their potential for reconfiguration that is available during run time. A central problem that emerges on hyperreconfigurable systems is to determine for a given computation the best points in time when a hyperreconfiguration should be performed. This problem has been called the Partition into Hypercontexts (PHC) problem. It was shown that the general PHC problem is NP-complete. But for several models of hyperreconfigurable machines fast polynomial time algorithms have been described to solve the PHC problem. As an illustrating example and to evaluate the introduced concepts we presented solutions of the PHC problem for example algorithms on a fine-grained and a coarse-grained reconfigurable architecture. The concept of hyperreconfigurations offers interesting questions for further research. For instance how can such architectures be realized and what are the best cost measures? How can the resource requirements be estimated? What specific problems occur for multi task hyperreconfigurable architectures? While we have considered the latter question in a companion paper [10] the other questions will be subject to our future research.
