Synthesising Energy-Efficient Embedded Systems with LOPOCOS by Schmitz, Marcus T. et al.
way to tackle the problem of reducing the energy dissipation is the usage of system level power management
techniques to trade-off system performance against power dissipation. One of the most promising techniques is
dynamic voltage scaling (DVS). By conjointly reducing supply voltage and operational frequency, DVS is able
to reduce the energy consumption signiﬁcantly. Since the reduction of the operational frequency is equivalent to
the reduction of the system performance, this technique is applicable to applications where the system schedule
shows periods of idleness or periods where a reduced performance can be tolerated. The ﬁeld of DVS ﬁnds it roots
in [41], where its usability was demonstrated considering a desktop computer environment. The dynamical and
conjoint adjustment of supply voltage and operational frequency, to satisfy the application needs, was shown to
lead to superior power savings compared to dynamic power management (switching off of idle components), when
both techniques are applicable [20]. This fact explains why DVS is attracting considerable attention from both
academia and industry. Most major digital processor vendors have recently introduced DVS enabled processor
types [25, 3, 2]. Many research has focused on the scheduling aspects for DVS, however, often making the
assumption of single processor systems [22, 20, 38, 27]. The trends towards co-design methodologies and dynamic
voltage scalable processors indicate the need for co-design techniques which support the consideration of DVS
processors to synthesise energy-efﬁcient embedded systems. Such a co-design framework will be presented in this
paper.
Three research groups have addressed issues which have a close relationship to the problems solved in the
LOPOCOS system. Luo and Jha [29] have extended an existing co-synthesis approach [11] to account for DVS
processing elements. Their approach is based on a DVS algorithm which keeps communication events ﬁxed, i.e.,
they reduce the global optimisation problem to smaller local problems which can be solved easier and faster. Fur-
thermore, they take battery characteristic into consideration [30] to increase the battery life time of the system. The
approach is very efﬁcient in terms of optimisation time and achieved energy reduction, when the used processing
elements show similar power consumption. However, in the case that the architecture is built out of highly hetero-
geneous PEs, this approach is likely to ﬁnd sub-optimal solutions, from the DVS point of view, since the ﬁxing of
communication events neglects the different power proﬁles during the schedule and mapping optimisation.
Gruian and Kuchcinski [18] presented an approach based on an energy conscious list scheduling technique. The
dynamically re-calculated task priorities are based on the energy gain inducted by scheduling decisions and the
critical path of the task graph. However, since task priorities based on energy gain might lead the list scheduler
to infeasible (timing violating) schedules, the task priorities are calculated as a weighted sum of energy gain and
critical path priorities. A good trade-off factor is found by increasing the weight of the tasks on the critical path
and re-scheduling the tasks until a feasible schedule has been found.
3t0 t1 t2 t0 t1 t2 g0-1g1-2 g0-1g1-2
t1 t2 t0 t0 t2 t1
g1-2 g0-1 g0-1 g1-2
t0
t1
t2
g1-2
g0-1
0 1 2 1 2 0 2 1 1 2
Genetic operation
PE2 PE1 PE0 PE2 PE1 PE0
CL1 CL2 CL2 CL1
invalid mapping string
(c) (b) (a)
mapping string
CL0
CL3
CL0
CL3
Figure 6: Combined optimisation of task and communication mapping
is shown in Fig. 6(b). Let us consider a certain genetic operation which transforms the mapping string shown
in Fig. 6(b) to the one in Fig. 6(c). A quick check upon this mapping indicates that this assignment of activities
represents in invalid solution. Consider for example the communication g0 1 between task t0 and t1. Although the
tasks are mapped onto PE0 and PE2, which are solely connected through CL0, the communication is mapped to
CL1. Hence, this mapping is invalid (if we consider that only direct communications are allowed without routing
over intermediate PEs).
Needless to say, a combined task and communication optimisation would lead to a high number of invalid
solutions during the optimisation, which, in turn, would have a negative effect on the convergence of the population
towards high quality solutions. To overcome this problem we propose a combined optimisation of scheduling
and communication mapping, which is carried out for each task mapping candidate. Or, in other words, the
communication mapping is carried out in Step 3 of the design ﬂow shown in see Fig. 1. In this way it is possible
to avoid invalid solutions, since all possible mappings of communication activities onto the communication links
are statically known for a particular task mapping. If the tasks t0 and t1, for example, are mapped to PE1 and
PE2, respectively, then the communication g0 1 can only be mapped onto CL2 or CL3. Our communication
optimisation, described next, takes advantage of this information to ensure that only valid solutions are produced.
The presented communication mapping optimisation is carried out in parallel with the schedule optimisation.
To explain this strategy consider the extended string representation, shown in Fig. 7, which encodes both a possi-
ble schedule and a communication mapping candidate. It can be observed that this string is divided into priority
and communication mapping genes. A list scheduler determines an execution order based on the encoded priori-
ties, while the mapping of communication activities onto the communication links is given by the communication
15t1
t0
t2
g1-2
g0-1
t0
g0-1
t1
g1-2
t2
g0-1
g1-2
Scheduler
List
Priorities
Comm.
Mapping
.
.
.
Extended String
Communication
mapping
Scheduling
PE0
PE1
PE2
CL0
CL1
CL2
CL3
Figure 7: A combined priority and communication mapping string
mapping genes. Here we concentrate on the communication optimisation, while details about the schedule op-
timisation can be found in [37]. This string representation allows the concurrent optimisation of priorities and
communication mappings, using a single genetic algorithm. However, it necessitates specialised genetic opera-
tors, like crossover and mutation, which operate on the two string parts in parallel but without interference. This
is done to avoid a mixtures of genes that would result in invalid off-springs.
For each of the mapping genes only valid values, which result in feasible solutions, are allowed. This is not
hard to achieve due to the fact that the task mapping precedes the communication mapping. Thereby, for every
communicating pair of tasks the possible CLs are unambiguously speciﬁed. Therefore, it is possible to generate
random initial chromosomes (random in the sense that a random choice is taken among the possible CLs) that
assure proper communication mappings. Similarly, the mutation operator chooses randomly among the valid
possibilities. This validity is further maintained by the standard (single or two point mating) crossover operations,
due to the fact that genes maintain their spatial position in the chromosome of Fig. 7.
In order to keep the optimisation time low, the communication mapping string is dynamically adapted to the
particular task mapping, as the number of inter-PE communications changes. Of course, the valid values of
each gene change also dynamically in accordance to the task mapping. Note, that the presented communication
mapping optimisation improves both the timing behaviour as well as the power consumption, due to the ﬁtness
calculation based on equations (4) and (5). The mapping experiments, given in Section 3, indicate the importance
of this optimisation step to achieve high quality solutions in terms of feasibility and energy savings.
16