Abstract-There is trend towards networked and distributed hardware reconfigurable systems, complicating the design process at the system-level. This paper will provide a solution to the problem of design space exploration for such embedded systems of the next generation. We will show the problems occurring while exploring the design space at the system-level, leading to new properties for valid implementations. The novelty of this approach lies in the support of explicit communication modeling and time-multiplexed architecture modeling in a single model. The proposed design space exploration is based on Evolutionary Algorithms and a new slack-based list scheduler.
multiplexed or mutually exclusive resources. By using hardware reconfigurable resources, a new problem known as temporal partitioning has to be solved [4] . This problem can be formulated as a combined partitioning and scheduling problem. The goal is the partitioning of operations into different configurations while preserving their partial order imposed by data dependencies over configuration boundaries. Approaches to solve the temporal partitioning problem are based on list scheduling, Integer Linear Programs, or network flow approaches. All these methods have in common that they assume dedicated fine grain resources. Our approach is somehow different in the way that configurations are composed of coarse grain system-level components like CPUs, busses, IP cores or again FPGAs. Furthermore, our model is capable in modeling the whole system not only a single FPGA.
As networked and distributed reconfigurable hardware platforms [6, 9] are becoming more and more important for applications in different areas of automotive, body area networks, etc., there is a need to support the design of these systems also at the system-level. First examples of coarse grain, networked reconfigurable architectures are PACT [1] and Chameleon [5] and the ReCoNet architecture [7] . Most of the system-level design space exploration tools proposed in the literature are not able to deal with mutually exclusive resources. The CORDS system is an approach to the hardware-software partitioning problem based on Evolutionary Algorithms [6] . It considers hardware reconfigurable resources and reconfiguration times and assumes buffers for the communication implying implicit communication. The scheduling is performed with a preemptive static critical path scheduling. This approach is somehow similar to our approach but does not consider an appropriate communication model as necessary at the system-level.
To the best of our knowledge, there is no design space exploration tool which supports both, explicit communication and reconfigurable hardware. In this paper, we will close this gap by presenting such a tool based on Evolutionary Algorithms which considers both aspects. It strictly separates functionality from the architecture. In particular, it allows configurations to be composed of system-level components. We will see, that testing the feasibility of an implementation has to consider timed-multiplexed resources in a special way. In the next section, we will first discuss the problem of hardware-software partitioning for system-level applications with mutually exclusive resources. Next, Section III will focus on the feasibility of solutions found by our design space exploration tool. In Section IV, the design space exploration tool will be briefly presented. Section V will provide some conclusions.
II. PROBLEM FORMULATION
In this paper, we focus on a graph-based approach to the hardware-software partitioning problem while considering hardware reconfigurable components and explicit communication. The problem is to map a process graph onto components in a way that data dependencies given by the process graph can be handled in the resulting implementation. But first, we will introduce a graph-based model used throughout this paper. In this paper, we will assume the process graph to be a homogeneous dataflow model, i.e, a process can only be executed if all its predecessors finished their computations. In order to model the underlying hardware of a system, we use a so-called architecture graph g a .
Definition 1 (Process Graph)
A process graph g p = (V p , E p ) consists
Definition 2 (Architecture Graph)
An architecture graph g a = (V a , E a ) consists of a finite set of vertices V a and a finite set of directed edges E a ⊆ V a × V a . The vertices v ∈ V a of the architecture graph g a correspond to hardware modules like CPUs, memories, FPGAs, etc. The edges e ∈ E a ⊆ V a × V a of the architecture graph g a correspond to connections between hardware modules modeling the topology of the architecture. Furthermore, vertices v ∈ V a in the architecture graph may be refined by subgraphs g ∈ v.G associated with v.
In the context of reconfigurable hardware, one can see that an FPGA may be configured with different architectures, socalled configurations. If a set of subgraphs G ⊆ v.G is selected as a refinement of a vertex v, we are able to flatten our model. All subgraphs associated with the same vertex are mutually exclusive, i.e., at most one subgraph may be activated at each instant of time. Thus, a vertex v ∈ V a representing a hardware reconfigurable node may be refined by a single subgraph g ∈ v.G modeling the selected configuration.
The basic idea of a hierarchical architecture graph is shown in Figure 2 . Figure 2(a) shows an FPGA and three associated configurations. The first configuration is composed of a CPU and a communication IP whereas the second configuration uses an additional IP core. The corresponding architecture graph is shown in Figure 2 (b). It consists of a single vertex modeling the FPGA and three associated subgraphs representing the configurations. Finally, mapping edges are used to restrict the search space. A mapping edge m = (v p , v a ) ∈ E m models the fact that the process associated with vertex v p can be implemented on the resource associated with vertex v a . Furthermore, some property values, like delays, power consumption, area, etc. may be associated with the mapping edges or the resources in the architecture graph. These properties are needed for optimization purposes. Now, the hardware-software partitioning problem can be formulated as follows: Given an acyclic process graph g p , a hierarchical architecture graph g a , and mapping edges E m , find an allocation of resources and configurations as well as a corresponding feasible binding of processes to resources (c.f. Figure 3) . In contrast to Figure 1 , resources might be refineable by configurations. The feasibility of an allocation and a binding will be discussed in the next section.
A specification as shown in Figure 3 (a) usually contains several mapping edges per process, describing possible alternative bindings of this process. Moreover, the model is parameterized in a sense that attributes are associated with resources and III. FINDING FEASIBLE SOLUTIONS In this section, we will focus on the feasibility of an allocation and a binding. As proposed by Blickle et al. [3] , a feasible binding maps each process to exactly one allocated resource and communicating processes are mapped onto the same resource or adjacent resources. That way it is guaranteed that there is a communication channel for establishing the required data exchanges between communicating processes. An allocation is said to be feasible if it allows a feasible binding. This problem is much harder when having mutually exclusive resources. A dataflow actor can fire iff all its inputs are available. So, placing predecessors in mutually exclusive configurations may invalidate a binding, since not all input data can be available at the same time. For instance, binding process P3 onto resource R1 in Figure 3(a) , prohibits the execution of process P4 because process P1 and P3 cannot exist in parallel due to the mutually exclusiveness of the two configurations. Beside this simple infeasibility which can be detected locally, there is another problem regarding the schedulability. The problem comes into view when we try to schedule the process graph. A first attempt starts with process P1. By doing this, process P2 cannot be scheduled due to the mutually exclusiveness of the configurations. Hence, we cannot schedule processes P3. On the other hand, if we start with process P2, we are able to schedule process P3. But now we detect a deadlock due to process P4 which needs the input data from P1. By scheduling P1, these input data can be provided but at the same time the required input data of process P5 are deleted.
The infeasibility of such implementations can be detected formally: Connected subgraphs mapped onto the same configuration as well as their direct successors are assigned to a cluster associated with the configuration (see Figure 3(b) ). With these cluster information, a so-called cluster graph g c for mutually exclusive configurations can be constructed. An example of a feasible implementation is shown in Figure 4(a) . The corresponding cluster graph is given in Figure 4(c) . Process P2 and its direct successors P3 and P6 are assigned to cluster Cluster1. Process P1 with its direct successor process P4 are assigned to cluster Cluster2. For the sake of completeness process P5 is assigned to an individual cluster. The Gantt chart for the implementation is shown in Figure 4 (d). One can see that not only the processes are scheduled but also the associated configurations.
Definition 3 (Cluster Graph
IV. DESIGN SPACE EXPLORATION Our design space exploration tool MoDES (Model-based Design space exploration for Embedded Systems) is based on Multi-Objective Evolutionary Algorithms (MOEAs). It uses state-of-the-art optimization algorithms provided by the PISA framework [2] . The individuals encode all information necessary to represent an implementation and, in addition, to find new solutions. The allocation of resources and configurations is encoded in a bit string, while the binding of processes to resources is based on priority lists. We implemented a repair mechanism similar to the one proposed in [3] , where resources are added on demand following a priority strategy. Implementations which are still infeasible afterwards are punished by infinite objective values. This approach differs from CORDS [6] in the way that we allow infeasible implementations in our exploration. Hence, we omit the computational intensive clustering of implementations with the same target architecture. MoDES is adaptable in number and kind of objective values, where all objectives have to be derived from parameters given in the specification. In order to support hardware reconfigurable devices, we have implemented the feasibility check discussed in detail in Section III. Furthermore, the scheduling strategy used by MoDES must consider cluster dependencies, in order to compute feasible schedules. The implemented scheduler is a slack-based list scheduler. This scheduler is modified such that it can handle cluster dependencies, i.e., after computing the cluster graph, a schedulable operation bound on a configuration is only scheduled iff all preceding clusters are already scheduled. The scheduling algorithm is depicted in Figure 5 . In a first step, the list of schedulable processes is determined. Next, each resource is tested if it is available or not. Afterwards, all processes which do not depend on the finishing of a cluster are stored in the schedule list V k for resource k. Then, the processes with the highest priority (smallest slack) is selected for scheduling. Of course, this scheduling strategy generally produces suboptimal schedules.
We have tested the MoDES framework on an examples of an JPEG encoder mapped onto reconfigurable hardware. It should be clear that MoDES can be easily extended to consider reconfiguration times as proposed in [6] .
LIST SCHEDULER
INPUT:gp, α, β OUTPUT:τ numsched = 0; t = 0; WHILE (numsched ≤ |Vp|) DO readylist = no pred(Vp); FORALL (k ∈ α) DO IF (not available(k)) THEN BREAK; V k = no pred cluster(readylist, k, β); v k = highest priority(V k ); τ (v k ) = t ENDFOR t = t + 1; ENDWHILE RETURN τ END V. CONCLUSIONS Due to the trend towards networked and distributed hardware reconfigurable systems, new design space exploration methodologies are needed at the system-level. New results presented in this paper include (a) the support of explicit communication modeling and (b) the modeling of time-multiplexed architectures such as necessary to model reconfigurable hardware. It has been shown that (c) new properties for feasible solutions have to be established and tested for such systems during design space exploration. We have shown that constructing the so-called cluster graph helps to detect hidden deadlocks which occur from mutually exclusive resources. Based on these results, a slack-based list scheduler for time-multiplexed architectures an explicit communication was proposed.
