Linking codesign and reuse in embedded systems design by M Meerwein et al.
Linking Codesign and Reuse in Embedded Systems Design
M. Meerwein, C. Baumgartner
Robert Bosch Gmbh































































This paper presents a complete codesign environment
embedded systems which combines automatic partition
with reuse from a module database. Special emphasis
been put on satisfying the requirements of industrial desi
practice and on the technical and economic constraints a
sociated with automotive control applications. The objec
oriented database architecture allows efficient manageme
of a large number of modules. Experimental results from
real-world example demonstrate the viability and advanta
es of the presented methodology.
1. Introduction
Automotive applications are highly cost-critical and ye
they have to satisfy stringent real-time constraints. An exa
ple are small, detached systems for sensor signal proces
or actuator control taking computational load off the ma
control unit. While they are essentially of control-dominate
nature, they also comprise digital signal processing to
small extent. Maximizing the software-implemented sha
of the system preserves a high degree of flexibility later
the product’s life cycle. On the other hand, a small numb
of time-critical functional units (often signal processing
would demand a powerful and expensive microprocessor
a pure software solution. This is solved by adding applic
tion specific hardware modules (coprocessors) making a l
powerful standard microcontroller sufficient to implemen
the remaining functionality.
The need for continuous design time reduction has led
a new design paradigm: Reuse of existing, validated mo
ules (“Intellectual Property” or IP) from previous projects o
third party vendors. While IP reuse is not a panacea for
sort of engineering problems, experience has shown tha























In automotive applications, serious effort is spent on so
ware and hardware optimization in order to minimize pro
uct cost – an investment that should be exploited at the b
by reuse. Therefore, our goal is to control partitioning by th
availability of IP. We believe that combining automate
HW/SW partitioning and code reuse bears a large poten
for a further cut in product cost and design time.
2. Related Work
Current codesign approaches like [1], [2] and [3] focus o
interactive partitioning, performance estimation, cosimul
tion, communication synthesis and code generation. Au
mated partitioning has been under intense research (e.g.
[3], [5], [7] and [8]). However, manual partitioning is still
predominant in industry with its quality depending largel
on the skills and experience of the designer.
While code reuse is traditionally widespread in softwa
engineering, the viability of this technique in the field o
hardware design has also been demonstrated. Use of con
urable modules and associated design flows[9] together w
version and configuration management [10] allow to impl
ment a new design as a composition of library compone
with only minor adaptation effort. Integration with existing
simulation and synthesis tools into a heterogeneous des
environment enables the application of HDL code reuse
industrial-scale projects.
In [11] a database-oriented reuse management system
presented. Focussing on the aspects “design for reuse”,
repository management” and “design by reuse”, this wo
addresses many important topics of industrial applicati
such as usability, protection against unauthorized access
design flow integration. Proposing a taxonomy-based clas
fication system, it allows to efficiently locate the databas
stored modules. Furthermore, that paper provides evide
that design reuse pays off economically.
3. Linking Codesign and Reuse
In our context, the method of reusing existing module
promises advantages over code synthesis:
• The real-time behavior and resource usage of existi
modules is well characterized by extensive profiling done
the previous testing process. This allows for a substantia















































tem during the partitioning process than estimation pe
formed before or during code synthesis. As our targ
applications are usually running under strict real-time co
straints, this is very important.
• Second, the software synthesis of common codesign s
tems generates C or C++ code. In the case of 8-bit process
with low register count however, assembly language pr
gramming has proved to be advantageous over compiling
code. Due to cost constraints, these CPUs constitute a sig
icant share of industrial and automotive embedded syste
Lacking support for assembly language generation in mo
codesign frameworks, reusing hand-written, heavily op
mized assembler modules is currently the only way to ove
come this problem short of writing code from scratch. Th
high coding and optimization effort for these modules mak
their reuse especially worthwhile.
We have chosen POLIS as a base for our work becaus
supports the codesign flow from specification input to syn
thesized hardware and software output. Targeted at cont
dominated real-time systems, it suits the domain of autom
tive control systems very well.
The goal of our work is to integrate reuse-driven automa
ed partitioning into the POLIS ([1]) codesign methodolog
as outlined in figure 1. The application specification is en
tered using Esterel[4] and passed – through POLIS – to
partitioning system where it is transformed into an intern
representation. It is enhanced with user-supplied timing a
resource usage constraints.
Figure 1: Extending the POLIS system
4. The Module Database
Our module database is running on a commercial re














































use management system [11] that has proven
effectiveness and stability in everyday design business. W
have added the capability to process essential partitioni
relevant information by extending the existing storage arc
tecture. The entity-relationship diagram of the new da
model is shown in figure 2. The# sign indicates primary
keys,* stands for non-null attribute while+ designates op-
tional attributes.
Figure 2: Data Model of the Module Database
Every module is categorized within the reuse manag
ment system by a three-level hierarchical taxonomy class
cation system (table 1). This ordinal information is store
together with the characterization data – real-time behavi
resource requirements (timer, ports etc.) and cost metric –
the MODULE table of the database. The timing characte
ization data is obtained by the FPGA-based coverificati
briefly outlined in chapter 5. It is capable of characterizin
large numbers of software modules in batch mode. T
PARAMS table contains the names and data types of para
eters (input/output variables, constants) associated with
module. Inter-module dependencies (a module requiring a
other one to work, a module functionally including anothe
one) are defined in theINCLUDED andNEEDED tables.
The object-orientation of the data model (property inhe
itance from other modules or abstract classes) makes it s









































































































us-modules differing only in minor details. Abstract classes d
not refer to instantiable code but define common properti
for a whole class of modules. Since we want to provide mu
tiple inheritance (m:n-relation), the inheritance informatio
is stored in the intermediate tablePARENTS.
As an example, we look at the transformation of recta
gular coordinates (Re,Im) into their polar representatio
(ϕ,Mag). First, an abstract class (Rect2Pol_meta, as sho
in table 2) defines characteristics such as the functional a
algorithm classification (taxonomy_1 , taxonomy_3 ) or
the input and output variables (param_... ) that are com-
mon for all implementations of this operation. It is identifie
by theabst_flag  being TRUE.
The records for the individual implementations listed i
table 3 all inherit (parent_id = Rect2Pol_meta) the basic
properties from the abstract class defined above. Only tho
attributes like the implementation domain (taxonomy_2 ),
the run time bounds and cost parameters that are uniqu
each implementation are listed in the associated recor
Since those modules refer to actual code and may be inst
tiated, theirabst_flag  is FALSE.
TheOBDD andCDFG tables provide storage for a ma
chine-readable representations of the module’s function
specification. They will be used in future partitioning algo
rithms: By comparing them against the system specificati
the user is relieved from the need to manually assign t





param_name Re Im Phi Mag
param_width 16 16 16 16
param_signed TRUE TRUE FALSE FALSE
param_type Input Input Output Output
cpu_type HC08
abst_flag TRUE
Table 2: Abstract class for coordinate transformation
Attribute
Value
Pure SW HWSW (ACP) HWSW (CCP)
parent_id Rect2Pol_meta Rect2Pol_meta Rect2Pol_meta
taxonomy_2 SW HWSW HWSW
runtime_upper 1325 785 52
runtime_lower 1305 770 52
cost (ROM) 795 522 18
cost (kgates) 0 8 5
ram_usage 10 6 0
abst_flag FALSE FALSE FALSE
needed_id — ACP16 CCP16












The database management software we use imposes
limitation of no more than one column of a data type suite
to store binary data per table. Hence, the OBDD and CDF
specifications and the module source code must be store
separate tables even though they have a 1:1 relation with
MODULE table.
5. Automating the Partitioning Process
The goal of our partitioning algorithm is to pre-select so
lutions meeting the real-time constraints and assign them
cost metric to select the most preferable solution. With pro
uct cost and flexibility in mind, we found software-biase
timing-driven allocation to be preferable. Thus, as much
the system as possible is implemented in software. Only
cases where timing constraints are violated, blocks a
mapped to hardware. However, some other factors are
stricting the freedom of the partitioning choice:
• For standardization in an industrial environment, th
CPU model is usually predetermined for a whole applicatio
family and is not modified to suit the problem.
• Standard microcontroller peripherals or reusable ha
ware or software modules implemented earlier are prefer
over newly synthesized components.
• Coprocessors can provide a speedup in multiple plac
while consuming chip area only once.
• Time-consuming hardware-software communication is
be minimized.
• User-supplied implementation rules (“must be har
ware”, “must be software”) have to be obeyed to facilita
migration between product generations.
After reading the specification (POLIS-written SHIFT
files) the application is clustered into modules. Currentl
the clustering granularity is inherently predetermined by t
modularity of the initial Esterel code read by POLIS. Nex
the modules are functionally categorized by interactively a
signing them to a functional class equivalent to level 1 of th
three-level hierarchical taxonomy classification system (c
table 1). Now, the data types of the interface variables
whose literal names are taken from the shift code – are de
mined.
Subsequently, the partitioning algorithm as depicted
figure 3 is invoked. For each module of the clustered syste
specification, a database search for a reusable mod
matching the functional classification (taxonomy level 1
and the auxiliary constraints (implementation domain, CP
type, resources, data types) is performed. If no implemen
tion constraint has been set, software modules are prefer
If an appropriate entry can be located in the database
module is assigned the attribute “instantiate”; otherwise t
module is scheduled for code synthesis.
After this has been accomplished for all modules of th
system, the inter-module dependencies (“includes”, “r
quires”) held in the reuse database are resolved: Additio
modules are scheduled for instantiation and unnecess






















































ndage) and real-time characteristics of the partitioned syst
will be calculated and checked against the user-suppl
constraints. If they are satisfied, the partitioning process
completed. If the real-time requirements (deadlines) are n
satisfied, modules are iteratively moved into the hardwa
partition, again preferring instantiation over synthesis. O
current strategy in this place is to start with the module wi
the greatest runtime. If resource usage constraints are vio
ed (e.g. the software modules require more timers than ex
the user is prompted to resolve the conflict. He may eith
change the available resources or manually create
workaround (e.g. two modules sharing one timer using
software scheduler) for the problem.
Figure 3: The partitioning algorithm
When the partitioning run is finished, the allocation infor
mation (HW/SW, synthesis/instantiation) is passed back in
POLIS, where a first verification can be performed by Ptol
my-based simulation. Subsequently, in places where ex
Begin



























database, instantiate / syn-




Replace SW module with
greatest runtime with HW
(taxonomy level 1) and CPU type,
preferring SW implementation











ing modules have been scheduled for instantiation, t
corresponding VHDL or C code pieces are checked out fro
the database. Otherwise POLIS synthesizes them. Fina
the partitioning choice is in-system-verified. For this pu
pose, we use a commercially available in-circuit emulat
for the CPU core coupled with FPGA boards to map app
cation specific controller peripherals. Modeling the com
plete ASIC including its physical interfaces and analo
portions, we are able to obtain results that are significan
more precise compared with cosimulation.
6. Results
We evaluated the performance of the presented appro
using the magneto-resistive steering wheel angle sensor p
lished in [12] as an example. The algorithm employed esse
tially consists of three components:
• Offset and gain correction (add & multiply operation) o
the four raw sensor signals (quadrature components ofΨ and
Θ) coming from the ADC
• Transformation (arctan function) of the four quadratu
signals into two anglesΨ andΘ
• Evaluation of the vernier function with
mechanical constants a and b into the wheel angleΦ
The operand and result data types of all operations sh
be 16 bit two’s complement. The total real-time constrai
for the entire system is 4000 clock cycles. Since about 10
cycles are required for communication and fault detectio
the constraint for the algorithm outlined above is 3000 c
cles. The target microcontroller architecture is the ST7 fam
ily (cf. [12]) from STMicroelectronics.
An excerpt from the module database is shown in table
ACP stands for an arithmetic coprocessor and CCP deno
a special coprocessor for the CORDIC algorithm. Colum
that are common to all rows (likecpu_type = ST7) have
been omitted.
The partitioning scheme presented in chapter 5 produ
the following results:
• A complete software implementation (result of first itera
tion in figure 3) requires 1271 bytes of ROM space and 40
cycles of execution time. Hence it misses the real-time r
quirement stated above.
• Moving the vernier resolution function (which contrib
utes the largest execution time share) into the HWSW imp
mentation domain requires instantiation of the ACP needi
additional 8 kgates. As the presence of the ACP enables
transition of the other two operations into the HWSW do
main, the execution time is reduced to 1181 cycles and
ROM usage is diminished to 788 bytes.
• Another valid alternative results if the implementatio
domain of the vernier function is set to SW as a user co
straint. Now, the algorithm chooses to use the CCP for t
arctan evaluation. It requires an extra 5 kgates and speed
the entire calculation to 2753 cycles. The ROM usag
amounts to 497 bytes.
At this time, the user may select between the second a





























































the third alternative. The latter requires less chip area for t
coprocessor and is hence favorable in terms of cost. If RO
space is a prime concern, the third choice is also prefera
while the second solution leaves significantly more slack
terms of real-time behavior.
For comparison, we let POLIS synthesize all three mo
ules and compile the resulting C files. The result takes
2555 bytes ROM space and needs 12360 cycles for exe
tion. Obviously, this is far beyond the constraints state
above and demonstrates the inappropriateness of this
proach. The largest optimization potential for hand-writte
assembler code – remaining unused with synthesized C c
– lies in the fact thata andb in the vernier function are con-
stants for a given mechanical sensor layout. This grea
simplifies the multiply and divide operations.
7. Future Work
The partitioning granularity is currently dependent on th
Esterel coding style. To become more specification-ind
pendent, we will explore the impact of granularity refine
ment algorithms (broadening the decision space) on t
partitioning quality. The current partitioning algorithm an
its reacttion to deadline violations is quite primitive an
bears considerable potential for improvement.
Assigning the modules to a functional taxonomy classif
cation is currently done manually. This will be automated b
using more abstract representations of functionality like CD
FGs and OBDDs. Another problem arises from the interfa
variable names of the database-stored modules not match
those in the specification. Currently, these literals are n
glected during partitioning leading to ambiguities if module













Add SW 15 8 0 —
Add HWSW 7 4 0 ACP
Add HW 1 0 0.2 —
Mul SW 124 60 0 —
Mul HWSW 7 4 0 ACP
Mul HW 1 0 1.2 —
Atan SW 1325 795 0 —
Atan HWSW 785 522 0 ACP
Atan HWSW 52 18 0 CCP
Vernier SW 820 408 0 —
Vernier HWSW 340 258 0 ACP
ACP HW — — 8 —
CCP HW — — 5 —















Our codesign system is targeted at industrial applicatio
and covers both partitioning and coverification. We exten
ed an existing cosimulation/cosynthesis environment by
tegrating an automated partitioning engine. Working on
database of reusable software and hardware modules, i
lows to obtain better results on systems built around sm
resource-limited controller cores.
The object-oriented storage architecture of the modu
database allows efficient storage and retrieval of a lar
number of similar modules. It provides storage for informa
tion needed in the further development on the part of the p
titioning algorithm.
9. References
[1] Balarin, F., Chiodo, M., Giusto, P., Hsieh, H., Jurecska, A., Lavagn
L., Passerone, C., Sangiovanno-Vincentelli, A., Sentovich, E., Suzu
K., Tabbara, B.: Hardware-Software Co-Design of Embedded Sy
tems – The POLIS Approach, Kluwer, Norwell, USA, 1997
[2] Madsen, J., Grode, J., Knudsen, P., Petersen, M., Haxthausen, A..:
COS: The Lyngby Co-Synthesis System, Design Automation for Em
bedded Systems, Volume 2, Issue 2, Kluwer, Norwell, USA, 1997
[3] Gajski, D., Vahid, F., Narayan, S., Gong, J.: SpecSyn: An Enviro
ment Supporting the Specify-Explore-Refine Paradigm for Hardwa
Software System Design, IEEE Trans. on VLSI Systems, Vol. 6, No
pp. 84-100, 1998
[4] Berry, G.: Esterel v5 Language Primer, Ecole de Mines and INRIA
Sophia-Antipolis, France, 1998
[5] Ernst, R., Henkel, J., Benner, T.: Hardware-Software Cosynthesis
Microcontrollers, IEEE Design & Test of Computers, December 199
[6] Eles, P., Peng, Z., Kuchcinski, K., Doboli, A.: System Level Hard
ware/Software Partitioning Based on Simulated Annealing and Ta
Search, Design Automation for Embedded Systems, Volume 2, Iss
1, January 1997, Kluwer, Norwell, USA
[7] Vahid, F.: Modifying Min-Cut for Hardware and Software Functiona
Partitioning, 5th International Workshop on Hardware/Software Code
sign, March 24-26 1997, Braunschweig
[8] Hollstein, T., Becker, J., Kirschbaum, A., Glesner, M.: HiPART: A
New Hierarchical Semi-Interactive HW/SW Partitioning Approac
with Fast Debugging for Real-Time Embedded Systems, 6th Interna-
tional Workshop on HW/SW Codesign, March 15-18 1998, Seattle
[9] Koegst, M., Conradi, P., Garte, D., Wahl, M.: A Systematic Analys
of Reuse Strategies for Design of Electronic Circuits, Proc. of DAT
Conference 1998, February 23-26, 1998 Paris, France
[10] Olcoz, S., Ayuda, L., Izaguirre, I., Peñalba, O.: VHDL Teamwork, O
ganization Units and Workspace Management, Proc. of DATE Co
ference 1998, February 23-26, 1998 Paris, France
[11] Reutter, A., Rosenstiel, W.: An Efficient Reuse System for Digita
Circuit Design, Proc. of DATE Conference 1999, March 9-12, 199
Munich, Germany
[12] Reichmeyer, H.: Erfassung von Winkel und Positionen im KFZ, Ele
ktronik Industrie 7/1999, p. 35ff, Huethig, Munich, Germany
