The size of search spaces in embedded system design is one of the most critical problems during design space exploration. Pareto-Front Arithmetics (PFA) has shown to be useful to overcome this problem by decomposing a hierarchical search space and just exploring each part of the system separately. Later, the exploration results are combined at higher levels of the hierarchy. In order to decrease the exploration time, this combination is performed in the objective space only. In general, this will lead to suboptimal and infeasible results. In this paper, we present new results regarding the trade-off between the quality of the results and the exploration time.
Introduction
Due to the large complexity of design spaces in embedded system design, heuristic techniques are mostly used to solve these in general multi-objective optimization problems. Different heuristic optimization techniques are discussed in the literature for non-hierarchical system synthesis (see [2, 3] ). In this paper, we propose Pareto-Front Arithmetics (PFA) to deal with the increased complexity in finding optimal solutions. PFA [5] is based on a hierarchical model of embedded systems [4] . This model allows the specification of design alternatives of the application algorithm as well as alternatives of different hardware architectures. Not only that this hierarchical approach helps designers to cope with the complexity, but it also captures the knowledge of problem composition.
The idea of PFA is to start exploring the Pareto-fronts by mapping the leaves in a given hierarchical specification. Later, these Pareto-fronts are combined to generate the Pareto-front on higher hierarchical levels. This way, we reduce the exploration time, but the constructed front might not be the true Pareto-front. Nevertheless, while using only a small fraction of time, we are able to find a substantial number of Pareto-optimal solutions.
In this paper, we focus on the trade-off between the quality of the results and the time needed for exploration. The concept of PFA was already mentioned in [6] and formalized in [1] . The application of PFA on embedded system synthesis problems is shown in [5, 4] .
The rest of the paper is organized as follows: The characteristics of hierarchical search spaces and hierarchical objective spaces are presented in section 2. The novel approach of Pareto-Front Arithmetics for fast design space exploration is outlined in section 3. Subsequently, we propose an algorithm based on Pareto-Front Arithmetics and uncertain objectives in order to improve the quality of the design points. Finally, section 4 proposes the application of PFA on embedded system synthesis. Furthermore, we will show by experiment that we are able to find a substantial number of Pareto-points and additional points near the true Paretofront by our novel approach.
Hierarchical Optimization Problems
Before considering the problem of design space exploration in embedded systems design, we will have a closer look on the more general problem of hierarchical optimization. Starting from (non-hierarchical) multi-objective optimization problems, we introduce the notion of hierarchical search spaces and hierarchical objective spaces.
A multi-objective optimization problem is given by: minimize o(x), subject to c(x) ≤ 0 where x = (x 1 , x 2 , . . . , x m ) ∈ X is the decision vector and X is called the search space. Furthermore, the constraints c(x) ≤ 0 determine the set of feasible solutions, where c is k-dimensional, i.e., c(x) = (c 1 (x), c 2 (x), . . . , c k (x)). In embedded system design, the decision vector encodes a particular implementation.
The objective function o is n-dimensional, i.e., we optimize n objectives simultaneously. There are k constraints Figure 1. Example of a (a) non-hierarchical decision vector and (b) a hierarchical decision vector consisting of a non-hierarchical part x 1 , x 3 , x 4 , x 6 , x 9 and a hierarchical part x 2 , x 5 , x 7 , x 8 .
Only those decision vectors x ∈ X that satisfy all constraints c i are in the set of feasible solutions, or for short in the feasible set called X f ⊆ X. The image of X is defined as Y = o(X) ⊂ R n , where the objective function o on the set X is given by o(X) = {o(x) | x ∈ X}. Analogously, the objective space is denoted by
In general, there is not only one global optimum, but a set of so-called Pareto-points [7] . A Pareto-optimal solution x p is a decision vector which is not worse than any other decision vector x ∈ X in all objectives (without loss of generality, we assume that all objectives are to be minimized), i.e., x ∈ X f : x x p , where
and the relations • ∈ {=, ≤, <, ≥, >} are defined as:
The set of all Pareto-optimal solutions is called the Pareto-set X p . An approximation of the Pareto-set X p will be termed quality set X q subsequently.
In the following, we assume that the search space has a hierarchical structure, i.e., each element x i of a decision vector x itself may be a vector (x i1 , x i2 , . . . , x iki ). The decision vector x consists of a non-hierarchical and a hierarchical part, i.e., x = x n , x h . Each element x h j ∈ x h itself may be a decision vector consisting of a non-hierarchical and a hierarchical part. Thus, the structure is not limited to only a single level of hierarchy. Figure 1 shows an example of (a) a nonhierarchical and (b) a hierarchical decision vector x. The non-hierarchical decision vector is a 1-dimensional vector. The hierarchical decision vector consists of a nonhierarchical part x n given by the elements x 1 , x 3 , x 4 , x 6 , and x 9 and a hierarchical part x h of the four elements x 2 , x 5 , x 7 , and x 8 .
Example 1
By using hierarchical decision vectors, we must reconsider the objective functions, too. On the top-level, the objective function is given by: o(x 1 , x 2 , . . . , x m ). Since we do not assume monotonicity for the objective functions, we must introduce a decomposition operator ⊗ to construct the toplevel objective function from deeper levels of hierarchy, i.e.,
where o n denotes the partial objective function for the non-hierarchical part of the decision vector, o h denotes the partial objective function for the elements of the hierarchical part of the decision vector, and Figure 1 
Example 2 For the decision vector shown in
Abraham et al. name three advantages of hierarchical decomposition [1] . The most important advantage refers to the fact that a feasible decision vector must be composed of feasible decision vectors of the elements in the subsystem. This reduces the search space drastically. Unfortunately, we cannot assume that a Pareto-optimal top-level decision vector is composed of Pareto-optimal decision vectors of the elements in the hierarchical part. Abraham et al. define necessary and sufficient conditions of the decomposition function of the objectives which guarantee Pareto-optimality for the top-level decision vector depending on the Pareto-optimality of elements in the hierarchical part of the decision vector [1] . Although these results are important and interesting, many optimization goals do not possess these monotonicity properties. In fact, they depend on the decomposition operator ⊗.
Pareto-Front Arithmetics
This section illustrates the novel approach of ParetoFront Arithmetics (PFA). PFA is used to solve optimization problems by exploiting the hierarchical structure of the underlying optimization problems.
The inputs to Pareto-Front Arithmetics are the quality sets from mutually disjunctive parts of a decision vector x. With PFA, we construct quality sets at higher levels according the structure of the decision vector. Figure 2 shows this concept of Pareto-Front Arithmetics for the decision vector given in Figure 1 . The main idea is now to do the necessary combinations in the objective space only. But how should this combination be established? Figure 3 shows the two most important operations used during PFA. The first operation is to take the maximum of each objective of two (or more) points (Figure 3(b) each Pareto-optimal point p 1i is combined with each Paretooptimal solution p 2j . The resulting objectives
) are filtered regarding Pareto-optimality. Figure 3 (c) outlines the addition of the objective of two or more Pareto-points: Each Pareto-optimal solution p 1i is combined with each point p 2j . Here, the resulting objectives are calculated as the sum of the objectives of the sub-
More formally, PFA operations can be defined as: o = h(y 1 , y 2 , . . . , y n ), where y j = o(x j ) ∀1 ≤ j ≤ n.
In embedded system design, these two operations may be used to construct a lower and an upper bound of the objectives, as will be shown later. For example, the implementation cost of a system that is composed of two subsystems can be restricted by the maximum implementation cost of each subsystem and the sum of those cost. The maximum of the cost of the two subsystems corresponds to the case where both subsystems share the same resource (e.g., IP core), while the sum of the cost model the fact that both subsystems are implemented using dedicated resources.
The objectives used during the optimization of embedded systems are non-monotonic due to resource sharing, power consumption being dependent on the binding, etc. Hence, we cannot claim Pareto-optimality for the implementations in the resulting optimality set when using PFA in general. But note: Even if we may not construct the best solutions, we are able to produce good implementations in less time by using PFA. This is due to the fact that we avoid the NP-complete computation of a feasible binding at higher hierarchical levels.
In order to improve the approach of Pareto-Front Arithmetics, we have to prevent the algorithm above from rejecting good points, i.e., we increase the quality of the results by still benefiting from short exploration times. This is done by considering a lower and upper bound of the objectives as described in the case of implementation cost. Figure 4 shows the concept of PFA using so-called uncertain objectives. The two quality sets to be combined are given in Figure 4(a) . The resulting quality set is given in Figure 4 (b). The objectives o 1 , o 2 are uncertain. Both ob- 
Unfortunately, by using property intervals, our definition of dominance becomes meaningless. In Figure 5 These problems arise when two property intervals overlap. This is also shown in Figure 5 . According to [8] , we use the notion of probabilistic dominance for Pareto-optimality. Here, we consider the case of uniform distributed design points, i.e., each discrete point y ∈ [o l , o u ] in a given property interval [o l , o u ] is with the same probability the actual Beside the processes and the resources, the specification also shows the possible bindings by mapping edges.
design point. Furthermore, we assume that all objectives are statistically independent.
For any two design points x 1 and x 2 , and m statistically independent objective functions o 1 , o 2 , . . . , o m the probability that x 1 dominates x 2 weakly is given by (see [8] ):
] denotes the probability that the objective value o i (x 1 ) of design point x 1 is less or equal than the corresponding objective value o i (x 2 ) of design point x 2 . [5] explains how to calculate this probability under the assumption of uniformly distributed design points.
In order to narrow our search space, we reject points which are dominated with a greater probability than a given, user specified probability bound p max . With this approach, we narrow our search space but still regard a great number of good points. Nevertheless, if we use a probability bound p max it still may happen that we reject a Pareto-optimal solution with a probability of 1 − p max .
Results
In this section, we present our results regarding the application of PFA on the problem of embedded system design. We start from a specification of our system by using a graphical representation. Figure 6 shows a specification of an embedded system. The behavior of the system is modeled by a so-called process graph g p where processes P are represented by vertices. A process p may be refined by a set of associated process graphs. The set of available resources R of the system is given by a so-called architecture graph g a . The relation between a process p ∈ P and a resource r ∈ R is modeled by a mapping edge m ∈ M ⊆ P × R. A mapping edge m = (p, r) indicates that process p can be implemented on resource r. Furthermore, there are some attributes associated with the mapping edges m ∈ M and the resources r ∈ R regarding the implementation cost c, the power consumption p, and the latency l.
Example 3
The specification in Figure 6 consists of two processes p 1 and p 2 . The process p 2 can be refined by any of the two associated process graphs consisting of the processes p 3 , p 4 , and p 5 , p 6 , p 7 . The architecture graph is given by the resources µP, DSP and FPGA. Figure 6 shows 11 mapping edges.
The goal of system synthesis is now to find an allocation α, i.e., a subset of resources, and a corresponding binding β, i.e., which process is implemented on which resource (r ∈ α). After determining the allocation and binding, we are able to compute the parameters of the implementation. Before considering the task of design space exploration, we discuss the objective space used in our benchmark.
Objective Space We consider a three-dimensional objective space which is defined by the most important objectives in embedded design: the implementation cost, the overall power consumption and the latency of an implementation.
Implementation Cost: The implementation cost c(x) for a given implementation x = (α, β) is given by the sum of costs of all allocated resources. In case that only the resources µP and FPGA are allocated in the example of Figure 6 , we have an implementation cost of c = 5000.
Power Consumption: The overall power consumption p(x) of a given implementation x = (α, β) is calculated by the sum of power consumption of all allocated resources plus the additional power consumption annotated at the mapping edges. Again, we consider the example given in Figure 6 . Furthermore, the allocation is given by α = {µP, FPGA} and the binding is β = {(p 1 , FPGA), (p 3 , µP), (p 4 , µP)}. Then the overall power consumption is given by p(x) = 200 + 28 = 228.
Latency: In order to get the latency l(x) of an implementation, we have to compute a schedule for all operations in the system. With the allocation and binding given above we are able to execute p 1 in parallel to p 3 , p 4 . The processes p 3 and p 4 must be performed sequentially due to the resource sharing. The minimum latency for this implementation is given by l(x) = max (8 + 7, 4) = 15.
Note, that all three objectives are non-monotonous.
Benchmark Here, we present our results in using ParetoFront Arithmetics in embedded system design. The hierarchical design space exploration starts from a specification as shown in Figure 6 . In a first step, we compute the partial implementations for each leaf graph in the specification. In a next step, we combine these results at a higher level of hierarchy as described in section 3. For the example in Figure 6 the decision vector is given by x = (x 1 , x 2 ), x 2 = (x 2,1 , x 2,2 ), and x 2,1 = (x 3 , x 4 ), x 2,2 = (x 5 , x 6 , x 7 ). We start with calculating the (partial) implementations for x 2,1 and x 2,2 only. Next, we combine these results.
In [5] we compared Pareto-Front Arithmetics against an approach based on Evolutionary Algorithms. We have shown that PFA performs better in a sense that we find a substantial number of Pareto-optimal solutions in a very short time. But, since we are facing non-monotonic objectives, we are not able to find all Pareto-points, while the Evolutionary Algorithms have been shown to do so.
In this paper, we compare our results against the true Pareto-set (calculated by exhaustive search) by using the following quality measurements. C(A, B) computes the relative number of points in a set B dominated by the points in the set A and is given by [10] :
The coverage
2. The distance D(A, B) computes the average distance from a point in the set A to its nearest point in B and is given by [9] :
where d i is the Euclidean distance between result i and the nearest member of the true Pareto-set B.
The results for the Pareto-Front Arithmetics, and PFA with three different probability bounds are shown in Figure 7 to Figure 9 . The probability bounds are chosen to 20%, 25%, and 30%. Figure 7 shows the coverage of the Pareto-sets by the quality sets computed by the PFA approach in dependency of the problem size. As we can see, we are able to find a substantial number of Pareto-points using our novel approach. Beside the true Pareto-points, we are able to find additional points near the true Pareto-front ( Figure 8 shows the distance of the quality sets computed by the PFA approach to the real Pareto-set). Figure 9 shows the number of combinations calculated by the PFA approach.
Obviously, we can improve our results (in coverage and distance) by varying the number of investigated points. This is done by using different probability bounds during the exploration. But, the most important point is that we can produce good results (for example C = 13.36% and D = 0.81 with our novel approach in a really short time (number of Combinations investigated: < 0.78%). Hence, ParetoFront Arithmetics should be used to construct initial solutions for iterative improving optimization techniques like Evolutionary Algorithms.
Conclusions
To handle the increasing complexity of embedded systems, we proposed Pareto-Front Arithmetics for fast design space exploration using results of subsystems to approximate the set of Pareto-optimal implementations. By using this novel approach, we can reduce the exploration time dramatically. In general, this leads to suboptimal results. In this paper, we analyzed the trade-off between the quality of the results and the exploration time. As a result, we have shown that we can find a substantial number of Paretooptimal points and a lot of additional points near the true Pareto-set. Thus, our approach seems to be good in creating initial solutions for iterative improving optimization techniques like Evolutionary Algorithms.
