Reconfiguration of legacy software artifacts in resource constraint embedded systems by Baldin, Daniel
RECONF IGURATION OF LEGACY SOFTWARE ARTIFACTS IN
RESOURCE CONSTRAINT EMBEDDED SYSTEMS
daniel baldin
A thesis submitted to the
Faculty of Computer Science, Electrical Engineering and Mathematics
of the
University of Paderborn
in partial fulfillment of the requirements for the degree of
Dr. rer. nat.
Paderborn, Germany
February 
date of public examination:
April , 
location:
Paderborn
Daniel Baldin: Reconfiguration of Legacy Software Artifacts in Resource Constraint
Embedded Systems, c© February 
ABSTRACT
Highly resource-constrained embedded systems are everywhere around us. Some of
them can be found inside smartphones, electronic control units (ECU), others in
wireless sensor networks or smart cards. The last two systems are among the most
restrictive ones in the sense of processing power, energy consumption and memory
availability. Pricing policies often lead to a reduction in software functionality as
cheaper hardware with less resources is demanded for the final product. In order to
allow more complex software to run on such constrained systems, this thesis proposes
the use of software reconfiguration. In contrast to traditional uses of reconfiguration,
which allow a system to support, e.g., software upgrades, this thesis proposes the
use of reconfiguration mechanisms in order to reduce the footprint of an deeply
embedded application while maintaining real-time constraints.
Today’s adaptable architectures require the support of reconfigurability and adapt-
ability at design level. However, modern software products are often constructed
out of reusable but non-adaptable legacy software artifacts (e.g., libraries) to meet
early time-to-market requirements. This thesis proposes a methodology to semi-
automatically use existing binaries in a reconfigurable manner. It is based on using
binary analysis techniques to reconstruct the semantics of the binary application in
order to allow the system developer to select meaningful code parts as components
from the binary code in an easy to use manner. Using a set of high level constraints
the user is able to extract components from the binary application. These com-
ponents are then subject to a design space exploration step, which optimizes the
resulting reconfigurable system regarding parameters as, e.g., worst case blocking
time and flash lifetime. With this approach, reconfiguration can be added with a
low effort to non-adaptive binary software in order to decrease the footprint of the
application while maintaining real-time constraints.
iii
ZUSAMMENFASSUNG
Hochgradig ressourcenbeschränkte eingebettete Systeme befinden sich überall um
uns herum. Einige dieser Systeme befinden sich in Smart-Phones oder elektronischen
Kontroll-Einheiten, andere in Sensor-Netzwerken oder auch Smart-Cards. Gerade
die zuletzt genannten gehören zu den in Bezug auf Prozessorleistung und Speicher-
platz am meist beschränkten Systemen. Häufig wird aus finanziellen Gründen die
Softwarefunktionalität reduziert, um günstigere Hardware mit weniger Ressourcen
einsetzen zu können. Um bei gleicher Ressourcenauslastung mehr Funktionalität
bereitzustellen führt diese Arbeit ein Verfahren ein, welche es erlaubt durch Rekon-
figurationstechniken genau dieses Problem zu lösen. Im Gegensatz zu traditionellen
Verwendungszwecken von Rekonfigurationstechniken, welche es z.B. erlauben Soft-
ware-Updates durchzuführen, wird in dieser Arbeit Rekonfiguration zur Reduktion
der Anwendungsgröße verwendet.
Heutige Architekturen, welche Rekonfiguration ermöglichen, basieren auf der Un-
terstützung dieser Mechanismen auf Entwurfs- bzw. Source-Code Ebene. Software
Lösungen basieren jedoch zum großen Teil auf wiederverwertbaren Bibliotheken oder
Drittanbieter-Komponenten, welche keine Unterstützung von Rekonfiguration mit
sich bringen und zumeist im Binärformat vorliegen. Diese Arbeit stellt eine Methode
vor, um ein existierendes System unter Verwendung von Binärcode automatisch in
ein rekonfigurierbares System umzuwandeln, mit dem Ziel die Anwendungsgröße zu
verringern und dabei weiterhin seine harten Echtzeitbedingungen zu erfüllen. Das
Verfahren basiert auf der Verwendung von Binärcode-Analyse Techniken zur Rekon-
struktion der Anwendungssemantik, welche es erlauben dem Benutzer durch Bedin-
gungen in einer Hochsprache Komponenten aus der Anwendungen zu extrahieren.
Diese Komponenten werden anschließend mit Hilfe einer Entwurfsraum-Exploration
optimiert in Bezug auf die globale Worst-Case Blockierzeit eines Tasks sowie der
Lebenszeit des Flash-Speichers. Mit dem Verfahren ist es möglich nicht rekonfigurier-
bare binäre Softwaresysteme in rekonfigurierbare Systeme umzuwandeln, welche die
Anwendungsgröße reduzieren und dabei harte Echtzeit-Bedingungen erfüllen.
iv
We have seen that computer programming is an art,
because it applies accumulated knowledge to the world,
because it requires skill and ingenuity, and especially
because it produces objects of beauty.
— Donald E. Knuth [Knu]
ACKNOWLEDGMENTS
I would like to express my deep gratitude to Professor Dr. Franz J. Rammig, my
research supervisor, for their patient guidance, enthusiastic encouragement and use-
ful critiques of this research work. I would also like to thank Prof. Dr. Uwe Kastens
for his early and very good feedback on many aspects of this thesis, which helped
to improve the work.
I would also like to thank all my colleagues at the University of Paderborn inside my
workgroup, at the C-Lab, the s-Lab and the members of the research project this the-
sis evolved of. The discussions and the joined work resulting in several publications
helped a lot to create this work.
Last I would like to acknowledge the support provided by my family during the
preparation of my final year project.
v

CONTENTS
i foundation 
 introduction 
. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. Goal of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
 basics 
. Smart-Card SOC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. Object Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. The ELF Object File Format . . . . . . . . . . . . . . . . . . . . . . 
. Program Representation . . . . . . . . . . . . . . . . . . . . . . . . . 
.. Control Flow Graph . . . . . . . . . . . . . . . . . . . . . . . 
.. Call Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. Data Flow Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
.. Data Flow Analysis Equations . . . . . . . . . . . . . . . . . 
.. Data Flow Problems . . . . . . . . . . . . . . . . . . . . . . . 
. Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
.. Graphs with Context . . . . . . . . . . . . . . . . . . . . . . . 
.. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
 related work 
. Binary Analysis Approaches . . . . . . . . . . . . . . . . . . . . . . . 
.. Link-Time Optimization . . . . . . . . . . . . . . . . . . . . . 
.. Problems solved by Binary Analysis . . . . . . . . . . . . . . 
. Program Analysis Problems . . . . . . . . . . . . . . . . . . . . . . . 
.. Code Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . 
.. Self modifying code . . . . . . . . . . . . . . . . . . . . . . . . 
.. Indirect Control Flow Target Detection . . . . . . . . . . . . 
.. Detecting Idioms . . . . . . . . . . . . . . . . . . . . . . . . . 
. Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
vii
viii contents
.. Binary Decoding . . . . . . . . . . . . . . . . . . . . . . . . . 
. Reconfigurable/Adaptable Systems . . . . . . . . . . . . . . . . . . . 
.. Structural Reconfiguration Mechanisms . . . . . . . . . . . . 
. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
ii the approach 
 legacy code reconfiguration 
. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. Consistency Preservation . . . . . . . . . . . . . . . . . . . . . . . . . 
.. Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
.. Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
.. State-Invariant . . . . . . . . . . . . . . . . . . . . . . . . . . 
. Architecture Restrictions of this Thesis . . . . . . . . . . . . . . . . . 
. Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
 control flow reconstruction 
. Building the Control Flow Graph . . . . . . . . . . . . . . . . . . . . 
.. Interprocedural Control Flow Graph . . . . . . . . . . . . . . 
.. Basic Block Augmentation . . . . . . . . . . . . . . . . . . . . 
.. Indirect Control Flow Target Resolution . . . . . . . . . . . . 
.. Safe Over-approximation . . . . . . . . . . . . . . . . . . . . . 
. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
 component model 
. Defining Reconfiguration Components . . . . . . . . . . . . . . . . . 
. Reconstructing the Application Semantics . . . . . . . . . . . . . . . 
. Generating the High-Level Annotated Control Flow Graph . . . . . . 
.. High Level Expression Detection and Normalization . . . . . 
.. Memory Access Patterns . . . . . . . . . . . . . . . . . . . . . 
.. Arithmetic and Binary Patterns . . . . . . . . . . . . . . . . . 
.. High Level Variable Substitution . . . . . . . . . . . . . . . . 
.. Global Variable Detection . . . . . . . . . . . . . . . . . . . . 
. Constraint-based Component Identification . . . . . . . . . . . . . . 
. Ensuring Disjoint Components . . . . . . . . . . . . . . . . . . . . . 
. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
 runtime reconfiguration 
contents ix
. The Reconfiguration Architecture . . . . . . . . . . . . . . . . . . . . 
. The Reconfiguration Protocol . . . . . . . . . . . . . . . . . . . . . . 
. Reconfiguration Activities . . . . . . . . . . . . . . . . . . . . . . . . 
.. Memory Management . . . . . . . . . . . . . . . . . . . . . . 
.. Replacement Strategy . . . . . . . . . . . . . . . . . . . . . . 
.. Indirection Layer . . . . . . . . . . . . . . . . . . . . . . . . . 
. Operating System Integration . . . . . . . . . . . . . . . . . . . . . . 
. Real Time Characteristics . . . . . . . . . . . . . . . . . . . . . . . . 
. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
 component optimization 
. Target System Restrictions / Notation . . . . . . . . . . . . . . . . . 
. Optimization Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
.. Component Partitioning . . . . . . . . . . . . . . . . . . . . . 
.. Component Merging . . . . . . . . . . . . . . . . . . . . . . . 
. Calculating the Worst Case Blocking Time κ . . . . . . . . . . . . . 
.. Efficient WCET Calculation by Path Enumeration . . . . . . 
.. Handling Cyclic Reconfigurations . . . . . . . . . . . . . . . . 
.. Speedup of the Algorithm . . . . . . . . . . . . . . . . . . . . 
.. Quality of the Estimation . . . . . . . . . . . . . . . . . . . . 
. Design Space Exploration . . . . . . . . . . . . . . . . . . . . . . . . 
. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
 binary transformation 
. Modification Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. ELF File Modification . . . . . . . . . . . . . . . . . . . . . . . . . . 
.. Instrumentation Code . . . . . . . . . . . . . . . . . . . . . . 
.. Data Duplication . . . . . . . . . . . . . . . . . . . . . . . . . 
.. Additional Linker Symbols . . . . . . . . . . . . . . . . . . . . 
. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
iii evaluation 
 evaluation 
. Case Study - SmartCard IPStack . . . . . . . . . . . . . . . . . . . . 
.. Design Time Overhead . . . . . . . . . . . . . . . . . . . . . . 
.. Reconfiguration Manager Binary Overhead . . . . . . . . . . 
.. Component Extraction . . . . . . . . . . . . . . . . . . . . . . 
.. Reconfiguration Delay Function . . . . . . . . . . . . . . . . . 
x contents
. Design Space Exploration . . . . . . . . . . . . . . . . . . . . . . . . 
. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
 conclusion and future work 
. Thesis Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
iv appendix 
a appendix 
a. Mathematical Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 
a. Additional Path Enumeration Considerations . . . . . . . . . . . . . 
a. Calling Convention (RC_ABI) . . . . . . . . . . . . . . . . . . . . . 
a. System Constraint Language ABNF . . . . . . . . . . . . . . . . . . 
a. Evaluation Design Points . . . . . . . . . . . . . . . . . . . . . . . . 
a. The ARMv(t) ISA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
bibliography 
L I ST OF F IGURES
Figure  An example tool flow for a software development process. Bi-
nary objects may be created by different producers in forms
of, e.g., libraries. . . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  The general idea of identifying components inside binary ob-
jects by means of control flow. . . . . . . . . . . . . . . . . 
Figure  The different steps of the approach proposed in this thesis. . 
Figure  A typical block diagram of a smart-card SOC. . . . . . . . . 
Figure  The structure of the Executable and Linkable Format (ELF)
object file format and a sample relocatable ELF file. . . . . 
Figure  A simple directed graph. . . . . . . . . . . . . . . . . . . . . 
Figure  The context insensitive call graph of the example program
seen in Figure  . . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Example Program . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Demonstration of the context sensitive graph construction:
a) a context-insensitive call graph G, b) the corresponding
call graph with context Gc . . . . . . . . . . . . . . . . . . . 
Figure  Illustration of the Code Discovery Problem. The word at
address 0x3002 may either be interpreted as an instruction
or a data word. . . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  An example of self modifying code inside the ARM THUMB
ISA. Address x is modified by the instruction at address
x. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Top-down disassembly of binary object code with symbol
annotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Overview of the reconfiguration methodology. . . . . . . . . 
Figure  Activity diagram of the Control Flow Graph (CFG) genera-
tion process. A series of methods are used to create a safe
representation while trying to be as precise as possible. . . . 
Figure  A small example assembler function and its basic blocks de-
coded from its binary object file. . . . . . . . . . . . . . . . 
Figure  A part of the CFG of the example program of Figure  gen-
erated by Algorithm . . . . . . . . . . . . . . . . . . . . . . 
xi
xii List of Figures
Figure  Control Flow Augmentation of ARM conditional instruction
execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Detection of relocation targets for indirect control flow. . . . 
Figure  Allowing the user to select components: identification of com-
ponents inside binary objects by means of control flows. . . 
Figure  Little Endian Memory Layout of the list structure for the
ARM EABI. Memory addresses increase from left to right. . 
Figure  Part of the SSL annotated control flow graph of an Internet
Protocol Stack developed for smart cards. The visible part
depicts the API method ethernet_input. . . . . . . . . . . 
Figure  The Intermediate ComponentsNri for an example CFG based
on definition ... . . . . . . . . . . . . . . . . . . . . . . . 
Figure  The intersection steps of Algorithm  illustrated on the ex-
ample CFG of Figure . The corresponding direct depen-
dency graph is displayed on the right side. . . . . . . . . . . 
Figure  The reconfiguration architecture and the possible control
flows: (.) control flow from a component to the Mandatory
Set, (.) control flow from between components, (). control
flow from the Mandatory Set to a component. . . . . . . . . 
Figure  The time intervals of the reconfiguration protocol and the
activity of the receive (Rx) and transmit (Tx) lines. . . . . . 
Figure  Activities of the reconfiguration manager upon entering a
component. . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Component placement using a LRU replacement data struc-
tures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  State of the LRU data structure, implemented as a double
linked list, based on Figure . . . . . . . . . . . . . . . . . . 
Figure  The interface required/provided by the reconfiguration man-
ager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Illustration of the reconfiguration blocking time and the fin-
ish time of a task. . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Illustration of the sizes Pc,Pf, si and λi . . . . . . . . . . . . 
Figure  Overview of the optimization steps involved in the selection
of a suitable design. . . . . . . . . . . . . . . . . . . . . . . . 
Figure  The partitioning function θobj illustrated. A component is
partitioned into multiple components based on the linear or-
der of the basic blocks inside their corresponding object file. 
List of Figures xiii
Figure  Example reconfiguration time intervals for a) not merged
components, b) merged components with size fitting into the
last chunk of data, c) merged components with an additional
data transfer cycle. . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Illustration of the path traversal on the context sensitive
graph. During the traversal the node ni is reached with dif-
ferent reconfiguration contexts. . . . . . . . . . . . . . . . . 
Figure  Example of two reconfiguration contexts during a path traver-
sal with n = 3 number of component slots. The dotted square
around a set of nodes defines a component. The underlined
elements inside a context trigger a reconfiguration as the
component entered is currently not loaded. . . . . . . . . . . 
Figure  Condition A. of the bound condition for the worst case re-
configuration delay. . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Loop unrolling for the calculation of the worst case reconfig-
uration blocking time. Loop is unrolled once to simulate one
iteration of it. All following iterations are dropped resulting
in a finite number of paths. . . . . . . . . . . . . . . . . . . . 
Figure  Reduction of the ICFG. Nodes, lying on a path to one of the
components S1 or S2, are marked and may be used for the
path enumeration. . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Demonstration of a path inside the CFG which can not be
taken at runtime. . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  The design point parameter rating function fr. The func-
tion value fr linearly decreases from 1 to 0. vmax acts as a
delimiter of the value v. . . . . . . . . . . . . . . . . . . . . . 
Figure  Overview of the binary transformation step. The object files
are transformed and inserted back into the linking step. The
final binary is used to update the references inside the recon-
figuration Components. . . . . . . . . . . . . . . . . . . . . . 
Figure  Addition of instrumentation code for a reconfiguration edge
between two successive basic blocks. . . . . . . . . . . . . . . 
Figure  Addition of instrumentation code for control flow into the
Mandatory Set. An additional symbol needs to be created to
identify the absolute address of node n2 inside the final binary.
Figure  Addition of instrumentation code for branches to compo-
nents. The handler basic block needs to be inserted inside
the object file at a suitable location inside the branch dis-
tance of node n1. . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Interception of return statements into components. The re-
turn address is modified prior to the corresponding function
call to allow the reconfiguration manager to intercept the
return. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Insertion of section symbols to ensure the correct linking pro-
cess of relocation entries inside the reconfiguration components.
Figure  Development of the blocking time κc in µs and its parts
tfull and tresidue for different component sizes with sp =
512,Pf = 256. . . . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Comparison of the worst case blocking time function κc in µs
and the measured values for different component sizes with
sp = 512,Pf = 256. The lower deviation is shown for the red
(measured) values. . . . . . . . . . . . . . . . . . . . . . . . 
Figure  The error between the function κc and the measured values
in percent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  Measurement of the blocking time in ms for a HTTP GET
request on the SFSCI smart card using the reconfiguration
approach and design Pm = 4608,Pc = 1536. . . . . . . . . . 
Figure  The steps of the reconfiguration methodology as proposed in
this thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Figure  A loop containing a conditional branch unrolled with the
loop removed after unrolling. . . . . . . . . . . . . . . . . . . 
Figure  ABI of the reconfiguration manager providing the reconfigu-
ration indirection mechanism. . . . . . . . . . . . . . . . . . 
Figure  The Program Status Register of a ARM processor. Empty
bit fields are processor specific and are left out for abstraction.
L I ST OF TABLES
Table  Data Flow Analysis Equations . . . . . . . . . . . . . . . . . 
Table  Copy propagation example. . . . . . . . . . . . . . . . . . . 
Table  Overview of different structural reconfiguration mechanisms
based on [Jan]. . . . . . . . . . . . . . . . . . . . . . . . . 
xiv
List of Tables xv
Table  Detection of function return statements by copy propagation
analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Table  Example program part with RTL annotation . . . . . . . . . 
Table  Extracted type information for struct list of Listing . . . . . 
Table  Memory Access RTL Normalization Patterns for a Little-
Endian architecture. τ: arbitrary RTL expression. φ: RTL
expression containing only expressions connected with the
binary or operator or the empty expression. n,m: constant
numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Table  Arithmetic RTL Normalization Patterns for a Little-Endian
architecture. τ arbitrary RTL expression. n,m constant num-
bers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Table  Structure RTL Normalization Patterns for a Little-Endian
architecture. n the bit representation of the load offset. [field]
is the field in the structure pointed to be l which fulfills the
constraints of the replacement pattern . s is the bitsize of the
load operation [LOAD]. . . . . . . . . . . . . . . . . . . . . 
Table  Path enumeration time for the evaluation application with
and without speedup. The complete application consist of
 nodes. Five components with a total number of 
nodes have been used for the calculation. . . . . . . . . . . . 
Table  Instructions that need to be modified inside the binary rewrit-
ing process for the ARMv(T) ISA. . . . . . . . . . . . . . . 
Table  An example relocation table before and after the binary
rewriting process. . . . . . . . . . . . . . . . . . . . . . . . . 
Table  Execution time of the design flow steps for the example sce-
nario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Table  Extracted component sizes in bytes after the component merg-
ing process. . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Table  System Timings for the evaluation scenario running on the
SFSCI smart-card with an ARMvt processor at  Mhz
Clock Frequency. . . . . . . . . . . . . . . . . . . . . . . . . 
Table  Pareto optimal design points (Pm,Pc) of the design space
exploration for the components of Table  over the param-
eter dr,dw,Pm. Some additional information on the design
points as the binary overhead and the overall size decrease
of the system are listed as well. . . . . . . . . . . . . . . . . 
Table  Lifetime of the Pareto optimal design with a maximum num-
ber of flash rewrites of fmax = 1000000 and a flash wearout
of dw = 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Table  Design points of the design space exploration of the evalua-
tion scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . 
Table  Continuation of Table  . . . . . . . . . . . . . . . . . . . . 
Table  The semantics of the PSR bitfields. . . . . . . . . . . . . . . 
Table  The ARM mnemonics referenced in this thesis. For a com-
plete list of all ARM/THUMB mnemonics see [ARMa]. . 
L I ST INGS
Listing  DU/UD chain example . . . . . . . . . . . . . . . . . . . . . 
Listing  Example C-Header containing a type definition of a structure
containing bit fields, pointers and attributes. . . . . . . . . . 
Listing  Example of a structure access. . . . . . . . . . . . . . . . . . 
Listing  Example Constraint Set. The corresponding ABNF can be
found in the Appendix in Listing . . . . . . . . . . . . . . 
Listing  Constraint set used for the evaluation example to extract the
IPv, TCP and TLS Components. . . . . . . . . . . . . . . . 
Listing  THUMB indirection to another component without returning.
Listing  THUMB indirection to another component with return. . . . 
Listing  THUMB indirection to Mandatory Code without return. . . 
Listing  THUMB indirection to Mandatory Code with return. . . . . 
Listing  ARM indirection to Mandatory Code without return. . . . . 
xvi
Listing  ARM indirection to Mandatory Code with return. . . . . . . 
Listing  ABNF of the constraint input language . . . . . . . . . . . . 
ACRONYMS
API Application Programming Interface
ABI Application Binary Interface
EABI Embedded Application Binary Interface
ISA Instruction Set Architecture
ELF Executable and Linkable Format
CFG Control Flow Graph
ICFG Interprocedural Control Flow Graph
CG Call Graph
JIT Just-In-Time
SSL Semantic Specification Language
RTL Register Transfer List
LRU Least Recently Used
OS Operating System
RM Rate Monotonic
xvii
xviii acronyms
DM Deadline Monotonic
EDF Earliest Deadline First
Part I
FOUNDATION

1
INTRODUCTION
Highly resource constrained embedded systems are everywhere around us. They
can be found inside smartphones, electronic control units (ECU), wireless sensor
networks or smart cards. The last two systems are among the most restrictive ones
in the sense of processing power, energy consumption and memory availability. Ad-
ditionally smart cards are applied in huge numbers, which often leads to the require-
ments of using as few resources as possible in order to use a cheaper smart card for
the final product.
. motivation
Although Moores law also holds for highly resource constraint systems, they still
have very restrictive memory constraints nowadays. Most of these systems only
support tens of kilobytes of non-volatile memory and much less volatile memory.
However, there is high demand for applications requiring more resources. This may
originate from additional operating system functionalities as protocols, drivers or
new services. The memory restrictions of small targets are also prohibiting the use
of many software quality tests if no prototyping hardware is available. For example,
generating code coverage statistics on small targets at runtime is often not possible
or very problematic as it demands additional resources [RD].
The approach in this thesis allows more complex software to run on such resource
constrained systems with only few changes to the application code. The technique
proposed in this thesis, which allows more code to be executed on, e.g, a smart
card than what would be possible within the physically available resources, is based
on runtime reconfiguration. Support for exchanging or reloading software parts has
been asked for since  [Van]. However, up to now no reconfiguration approach
has been proposed which suites the following needs:
 assuming no prototype platform with much more memory is available.

 introduction
. Support for legacy code: Software developer often do not have access to the
source code of, e.g., board support packages from third party vendors.
. Incorporation of memory restrictions: Many embedded systems use flash mem-
ory for program storage, imposing restrictions on the component loading pro-
cess.
. The reconfiguration overhead needs to be kept to a minimum in order to gain
any benefit from the reconfiguration on a highly resource constrained system.
In literature legacy code is often used with different meanings. In order to avoid any
ambiguity the word legacy code in this thesis refers to code for which no source codeLegacy Code
is available, thus, only the binary code in form of object code is available. However, it
is assumed that the high level Application Programming Interface (API) is available,
which is used by higher level programming languages to access the functions of
object code libraries. The availability of these method signatures is only a small
restriction since even proprietary libraries include header files containing structure
and method signatures describing the API of the library. If this is not be the case,
the entire library would not be usable by any higher level programming language as
the interfaces would be unknown.
While reconfiguration may offer the possibility to exchange parts of the Operating
System (OS) or the application running on a system, the increase in software flexi-
bility and the reduction in footprint can only be achieved by an increased execution
time. Finding a good balance between these two requirements is crucial and a major
part of this thesis.
. goal of the thesis
The first question that needs to be solved is the question about the integration of the
reconfiguration process including legacy code into a typical software development
process. The integration of the concept may take place at different steps which will
result in different problems that need to be solved. Figure  describes a traditional
tool flow inside a software development process with two steps: compilation and
system linking. The process may be enriched by both post link-time optimization
steps on the right end of the flow and source code generators on the left end. This
thesis concentrates on the object file level before link time after the binary objects
have been created.
. goal of the thesis 
Source
Source
Source
Compiler
Object
Object
Object
Linker Executable ELF Object
Compiler
Compiler
Pr
od
uc
er
 A
Pr
od
uc
er
 B
Figure : An example tool flow for a software development process. Binary objects may
be created by different producers in forms of, e.g., libraries.
The final software product is often a composition of self-written code parts and
application code created by third party developers. The percentage of code from
third party developers, however, may be quite high for certain products. To be as
close as possible to the source code level the integration of the approach proposed in
this thesis will be on object code level before linking the system. This allows for the
use of high level information as, e.g., the API of the objects, which will be provided
to use the object files from different producers.
The goal of the thesis is to allow applications, consisting of third party binary code,
which exceed the size of the available memory to run on a highly resource constrained
device by means of runtime reconfiguration. Even the most optimized application
may reach a lower bound on memory consumption for execution. If the physically
needed amount of memory needs to be further decreased, while maintaining the full
functional and temporal properties of the system, a different approach is needed.
The thesis solves this problem by replacing parts of the application at runtime
whenever the required functionality changes.
Existing reconfiguration approaches do not offer a sufficient solution for this prob-
lem statement. This is due to the fact that no reconfiguration system works on
binary object level. However, the binary object level is the only one available when
considering legacy implementations. The functionality of such applications can only
be specified by its binary code and its control flow between parts of the binary
code. For the reconfiguration of the system on binary level the approach defines
components as parts of the executable code of the application (see Figure ).
A reconfiguration approach working on this level has to answer the following ques-
tions:
 introduction
Binary 
Object
(e.g, ARMv4 ELF 
Object)
Software Product
uses
API
  Component
C
om
po
ne
nt
Component
  Component
Software Product
uses
API
On-Demand
Reconfiguration
Component
C
om
po
ne
nt
Component
Component
Control Flow
Software Product
uses
API
Figure : The general idea of identifying components inside binary objects by means of
control flow.
• How can meaningful components be extracted from binary code if no source
code is available?
Existing solutions use predefined interfaces as well as dependency specifica-
tions on source code level. Such an approach is not applicable on the binary
code level in the way it is done on source code level. Manually selecting binary
code parts is not applicable as long as the user is not an expert in machine
programming.
• How to derive a "good" design of the reconfiguration system depending on
static deployment parameters as, e.g., memory usage and worst case execution
time?
Performance optimization parameters for highly resource-constraint systems
differ from traditional server or desktop computing parameters and have to
cope with different constraints of the system. Depending on the importance
of different constraints the design parameters of the reconfiguration approach
may differ. One important parameter for embedded systems are hard real-time
constraints. Changing parameters of the reconfiguration approach may lead
to invalid designs which do not hold hard real-time constraints. Additionally
most highly resource constrained systems use flash memory for storing and
executing the application code with a maximum number of reprogramming
cycles, which needs to be taken into consideration.
 The term component is used within this thesis with the meaning of just denoting a piece of (binary)
code which is suitable for potential reconfiguration. This meaning of the term component is not to
be confused with the term used within the context of, e.g. UML.
. goal of the thesis 
Binary Analysis
Component 
Extraction
System 
DesignerOptimization
Binary 
Transformation
defines rules
Figure : The different steps of the approach proposed in this thesis.
• How to transform the original application into an application which supports
reconfiguration automatically?
For the applicability of the approach it is important that the approach is as
automatized as possible. As no source code is available all modifications also
need to be done on binary level complicating the problem statement.
In contrast to traditional reconfigurable systems one major assumption is that the
total available physical memory space is smaller than the sum of the requirement by
all components that may be needed at runtime. This inherently leads to an optimiza-
tion problem as not all components can be loaded at the same time. The steps of
the approach proposed in this thesis are depicted in Figure . In order to answer the
question how components can be identified on binary code level, the first step of the
approach is a binary analysis. A combination of well known techniques will be used
to reverse engineer the program behavior as much as possible. The approach, how-
ever, does not rely on the completeness of the reverse engineering process. Instead
the possibility to extract components scales with the amount of reverse engineered
application behavior. As more parts of the binary code are annotated with their
corresponding high level representation the chances to extract components using
the approach proposed in this thesis increases. The proposed approach allows the
system designer to specify rules by means of program constraints, which are used
to extract components for reconfiguration.
 introduction
The next part of the proposed approach describes the process of deriving a "good"
design before deploying the system. The quality of a design is measured in two steps.
In the first step a Pareto optimization filters all objectively optimal configurations.
The second step finally chooses one configuration out of the set of Pareto-optimal
ones by reducing the multi-dimensional optimization problem into a one dimensional
optimization problem using user specific weightings. This approach takes into ac-
count three optimization parameters. The first one is the flash wearout which de-
pends on the number of flash erase/write cycles that may happen at runtime due
to reconfigurations of the system. The number of reconfigurations heavily depends
on multiple design parameters. The second one is the worst case blocking time of
the system due to reconfiguration, which heavily depends on the possible execution
paths and, thus, the possible patterns of reconfigurations during the program exe-
cution. The last parameter is the maximum amount of memory space the system
may use for loading components.
The last part of the approach solves the problem of modifying the original binary
code to support runtime reconfiguration without linking reloaded components at
runtime. This is done by inserting instrumentation code for all possible control flow
types into and out of components. The instrumentation code itself ensures that
no linking at runtime is needed as the control flow is always transferred to the
reconfiguration manager which in turn ensures a safe control flow transfer. The
instrumentation code itself is kept as minimal as possible to reduce the runtime
overhead involved in this method.
. thesis contributions
The main contribution of this thesis is the development of a complete methodology
which allows software containing legacy or proprietary parts to be transformed into
a reconfigurable system. This is done in a fine granular manner with the overall
goal of reducing the binary footprint of the system while ensuring hard real-time
constraints. This contribution is further classified by the following parts.
An approach for extracting components from binary objects is given, which is based
on using reverse engineered program information in combination with a set of con-
straints given by the system designer.
A design optimization methodology for optimizing the system design based on dif-
ferent system parameters including real-time constraints is proposed.
. thesis outline 
A tool has been developed which implements the complete methodology and auto-
matically adapts the original binary code by adding instrumentation code as appro-
priate.
All major parts of this thesis have been published in conferences. The fundamental
idea and a first concept was published in [BGKO]. The integration of the reconfig-
uration methodology has been introduced in [BGOb]. The concept of identifying
meaningful components from the binary objects has further been integrated into the
work presented in [BBK+]. The overall approach, together with an evaluation of
it, has finally been published in [BGOa], which has been awarded one of the best
papers.
. thesis outline
The thesis is organized as follows:
Chapter  gives an overview of the basic concepts found inside the literature,
which are used or referenced in parts of the approach proposed inside this thesis.
The chapter introduces some fundamental concepts which are assumed to be known
in the later part of this thesis.
Chapter  summarizes the related work. It considers binary analysis related ap-
proaches on the one hand and reconfigurable software systems on the other hand.
A comparison on the related work and the approach proposed in this thesis is given
at the end of the chapter.
Chapter  introduces the general methodology of the reconfiguration approach. It
describes the requirements which need to be fulfilled and states the problems that
need to be solved. The chapter also states the architectural restrictions assumed
throughout the rest of the thesis and the open problems that will be solved in later
chapters.
Chapter  describes how the problem of decoding the applications binary code
is solved. It then describes the algorithm used to generate the control flow graph
out of the decoded binary code. This chapter summarizes existing solutions for this
problem, which are utilized by the approach.
 introduction
Chapter  introduces the concept of extracting components from the binary code.
It discusses in detail the concept that is used to allow the system developer to select
components by means of high level constraints on object code API variables.
Chapter  focuses on the introduction of the reconfiguration manager. It describes
the reconfiguration activities at runtime, introduces the reconfiguration protocol
and describes the calculation of the response time of a task under reconfiguration.
Chapter  describes in detail the design space exploration for finding a good design
of the final system. It introduces an optimization step which is needed to decrease
the worst case execution time of a task under reconfiguration. Therefore, it describes
the calculation of the worst case blocking time and concludes with a parameterized
rating function which allows the user to select a final configuration of the system.
Chapter  gives an overview of the modifications made to the original object files
during the program transformation phase. It describes the technical modifications in
terms of symbol table changes and addition of instrumentation code for the possible
control flow types of a program.
Chapter  introduces the evaluation scenario, which is based on a smart-card
internet protocol stack implementation developed in corporation with an industrial
partner. It describes the performance characteristics of the framework, the success
rate of the component identification process of Chapter  and the result of the design
space exploration phase used inside the optimization phase.
Chapter  finishes the thesis with a conclusion of the work and a outlook of future
research directions.
2
BAS ICS
This chapter will introduce basic concepts used throughout the following chapters.
In the first part an introduction of architecture specific basics used by the reconfigu-
ration framework is given. The chapter then introduces basic notations of program
representation and gives an introduction to fundamental concepts of program anal-
ysis which is used inside Chapter . The chapter will conclude with the definition
of context sensitive graphs, which form a basis for worst case reconfiguration delay
calculation used inside the design exploration step inside Chapter .
. smart-card soc
The design of embedded systems is considerably complex. It involves multiple in-
terdisciplinary activities containing system modeling, testing and synthesis. Embed-
ded system designs are very often tight to time-to-market, cost, memory and power
constraints making the design of these systems complicated. Because of tight cost
bounds embedded systems are also often very resource constrained.
While traditional embedded systems consisted of multiple hardware components as
a microcontroller, analog devices and I/O, todays systems are more and more in-
tegrated into one single silicon. Such a system is called a System-on-Chip (SOC)
[Jer]. These systems are usually domain specific as they try to fulfill the specific
characteristics of their application domain to allow a cost efficient design. Figure 
depicts a typical block diagram of a modern smart-card SOC. Different hardware
communication devices are connected to a central BUS which allows the Main Pro-
cessing Unit (MPU) to communicate with a connected terminal. The device often
contains three types of memory: Static RAM for volatile data storage used by the
application, Read-Only-Memory (ROM) often containing a bootloader and Flash
Memory used for storing the OS/application code. The sizes of the memory compo-
nents used inside the SOC are usually very small in smart-cards, wireless sensors
nodes or other resource constrained systems. Typical sizes of the non-volatile mem-

 basics
USB 2.0
ISO7816
UART
SRAMFlash
MPU Crypto
BUS
ROM
Figure : A typical block diagram of a smart-card SOC.
ory space value between  to  Kb. More expensive SOCs with more memory are
available, however, the increased costs are often prohibiting their use in certain
domains.
The block diagram of other embedded SOCs are very similar. However, as they
are domain specific they may differ in the set of used communication devices, the
number of processors, domain specific co-processors and the memory type used.
Flash memory, however, is widely used due to its characteristic of being very cheap.
Depending on the domain the support of cryptographic components and the speed
of the Main Processing Unit can vary. Typical operating frequencies of current smart
card generations range from  to  Mhz.
In  as stated in [eet] "the embedded market now represented  percent
of all ARM-based processor unit shipments". Many SOC manufacturer use ARM
processor designs nowadays. The market share of ARM processors for highly re-
source constrained systems is even higher. Estimates are reaching over % [tr].
However, no precise and objective values can be given here. Anyway, the ARM plat-
form is nowadays frequently used in many systems. Thus, for the evaluation of the
methodology inside this thesis the ARMv Instruction Set Architecture (ISA) has
been used. Binary object code is based on the ARMv Embedded Application Bi-
nary Interface (EABI) (containing: ARM Procedure Call Standard [ARMf], ARM
ELF [ARMe], the Base Platform ABI (BPABI) [ARMb], the C++ ABI, the Ex-
ception Handling ABI [ARMc], the Run-time ABI [ARMg] and the C library
ABI [ARMd]) and contained inside linkable ELF object files.
. object files 
. object files
The approach proposed in this thesis is centered around using object files as the
format an application is given to the reconfiguration approach proposed in this
thesis. Thus, the following two sections concentrate on the introduction of specific
characteristics of the object format used by the approach. It shortly summarizes
basic concepts of linkers and loaders [Lev].
Object files are created by assemblers and compilers and contain binary code and
data for a source file. They can be grouped to form libraries and are typically used
to provide reusable functionalities in form of linkable binary code for a specific
hardware architecture. Object files may be linkable, executable and loadable or any
combination of the three. Linkable object files contain information for a linker, which
allows the code to be combined with other linkable object files; usually but not
exclusively to create an executable object file out of it. Executable object files are
capable of being loaded into memory and run as a program. Loadable object files
can be used to be loaded at runtime into memory and executed together with a
program.
An object file typically contains different kinds of information. Some of them are the
object code, containing the instructions and data, relocation information containing
a list of places that need to be adapted if the address of the object code is changed
and a symbol table containing information on global symbols used and defined by
the object code. In general there exist more categories of information as, e.g., debug
information that may be contained inside an object file. However, the listed cate-
gories form the set of information that is assumed to be contained inside an object
file to be linkable. This however is no hard restriction as legacy object files typically
contain at least these three kinds of information as many are provided as linkable
object file libraries.
This thesis will contain a binary transformation step which modifies parts of these
object files, including removing and adding code parts. Specifically the ELF Object
file format will be used in this context. This specific object file format is the topic
of the following section.
 basics
ELF header
Program header
table
Section
....
Section
Section header
table
ELF header
(segment table)
.text
.data
.rodata
.bss
.rel.text
.symtab
.rel.data
.rel.rodata
.strtab
Section table
describes
sections
describes
segments
Figure : The structure of the ELF object file format and a sample relocatable ELF file.
. the elf object file format
The ELF object file format is a commonly used object file format inside the UNIX
community. The object file format has first been published in the year  inside the
System V Application Binary Interface Specification [AT&] and is now the chosen
standard binary file format for Unix-like systems. The ELF format is a flexible object
file format which can be used for various purposes. One of them is the development
of cross-compiled embedded applications or libraries and a rather huge tool support
has developed around it.
The ELF object file format comes in three different flavors which correspond to the
object file formats in the same order as listed inside the previous section: relocatable,
executable and as a shared object. Figure  gives an overview of the general structure
of the file format. The program header table describes the segments of the file. One
segment can contain multiple sections as seen in figure . Segments are used by the
system loader to place logically related sections in the same memory region. The
program header table is, thus, optional for relocatable files as they contain data
that needs to be processed by a linker first so that all dependencies are resolved and
. the elf object file format 
the final addresses of all symbols are calculated before they may be loaded onto a
device. Executable ELF files have all relocations done and contain a runnable code.
Shared objects are libraries that contain symbols and runnable code parts, which
can be linked at runtime into a program.
An overview of an example relocatable ELF file can be seen in Figure  on the
right hand side. The typical relocatable file contains the sections listed inside the
figure. Additional section names are defined by the corresponding elf architecture
description and may be hardware dependent. The sections include the following:
• .text contains the executable code.
• .data contains the static data used by the executable code.
• .rodata contains the data that is read-only as e.g. constants.
• .bss contains no data and is allocated at runtime. It is typically used for zero
initialized data structures.
• .rel.text, rel.data and .rel.rodata contain the relocation information for the
corresponding section. They define the places that need to be fixed if the
absolute memory position of the section changes.
• .symtab contains the symbol table of the object file. Symbols may be refer-
enced as relocation symbols by relocation entries inside relocation sections.
• .strtab is a table of name strings for the section names or the symbol table.
Important to note here is that executable files do not need to specify any symbols as
the binary code is already linked and needs no further processing. This will be one
of the reasons why the methodology proposed inside this thesis assumes the binary
code to be available as relocatable object files. Nonetheless, this restriction is not
diminishing the usability of the approach dramatically as most third party code is
usually available as relocatable object code. The benefit gained by this assumption
will be useful for decoding binary code as it is described in Chapter .
 basics
. program representation
This section will introduce two program representation forms, namely the Control
Flow Graph and the Call Graph. The component extraction and optimization steps
will work on these data structures. Thus, they are described here to avoid uncer-
tainties.
.. Control Flow Graph
The Control Flow Graph is a basic representation technique used by code optimizer,
program analyzer and various kinds of other tools. It is based on constructing a
graph of basic blocks for each procedure of the program representing the control
flow. The graphs are then used for different kinds of analyses; one of them being
the data flow analysis. In general the concept of the CFG is not restricted to a level
of code representation as, e.g., machine code or high level instructions. A CFG may
be generated for any level of code representation.
A graph may be graphically represented in different ways. One graphical represen-
tation can be seen in Figure . Different graphical variations exist that may contain
additional information associated with edges or nodes. Nodes are used to represent
different kinds of units of interest. A control flow graph uses nodes to represent
Basic Blocks of a program. However, before formally defining a Basic Block it is
important to define basic units of a program.
Definition .. (Instruction Sequence):
A instruction sequence of program P is a sequence of instructions which are stored
in consecutive memory locations inside the image of program P.
Using this definition we can now define a Basic Block:
Definition .. (Basic Block):
A Basic Block of program P is a maximal instruction sequence of program P thatBasic Block
has only one entry point and one exit point.
In general every program can be uniquely partitioned into a set of non-overlapping
basic blocks which makes it possible to clearly represent a program as a control
flow graph. A control flow graph is a directed graph which represents the flow of
control of a program during execution. The nodes of a control flow graph represent
. program representation 
n1 n2
n3 n4
Figure : A simple directed graph.
the Basic Blocks of the program. The edges of the control flow graph represent the
possible flow of control between the corresponding basic blocks.
Definition .. (Control Flow Graph):
A Control Flow Graph CFG for a program P is a directed graph G = (N,E) with: Control Flow Graph
• Every node n ∈ N represent exactly one basic block of the program P.
• Every egde e = (n1,n2) ∈ E represents the flow of control from the corre-
sponding basic block represented by n1 to the block represented by n2 inside
program P.
.. Call Graph
A Call Graph (CG) is a directed graph which represents the calling relationship
of functions inside a program. Every node in a call graph represents a function.
Edges correspond to function calls. Call graphs may be categorized into varying
degrees of precision and uses. The most precise call graph is the context-sensitive
call graph which contains a separate node for every call depending on the execution
context. A fully context-sensitive call graph representation may easily take up a
lot of memory and take a long time to calculate if computed statically. However,
if computed dynamically and used for dynamic program optimization at run-time
based on the caller stack this representation may efficiently be used.
Context-insensitive call graphs only use one node for every function of the program
and thus do not include information on the calling context. The analysis based on
context-insensitive graph is thus less precise than the context-sensitive one. Figure
 basics
entry()
f1()
f2()
Figure : The context insensitive call graph of the example program seen in Figure 
procedure entry
F()
F()
end procedure
procedure F
...
F()
...
end procedure
procedure F
...
F()
...
end procedure
Figure : Example Program
 demonstrates the context-insensitive call graph of the example program shown in
Figure .
. data flow analysis
Conventional Data Flow Analysis refers to the process of gathering information
about the set of values computed at certain points inside a program. Tools use the
information gathered by the data flow analysis process to perform “code-improving
transformations” [Kil, MR, BAR, Kno] as, e.g., remove redundant register
load operations, optimize the register usage of a program by doing a live register
. data flow analysis 
analysis [ASU] or solving the type inference problem [KU, KU]. Although
many of these problems have been discussed years ago, data flow analysis remains
a fundamental technique for all optimizations that require information on the state
of a program at runtime. More recent application scenarios are, e.g., security re-
lated program monitoring and intrusion detection analysis [XMZX, WD]. Code-
transformation tools benefit from the information gathered by a data flow analysis.
Dynamic binary translators as, e.g., QEMU [Bel] may be listed in this sense.
Generally speaking the use of a CFG or a call graph together with data flow anal-
ysis techniques has been used for a long time for various reasons. Some important
techniques used inside this thesis will be explained in the following subsection.
.. Data Flow Analysis Equations
In the context of this thesis data flow equations will be used inside the binary anal-
ysis of a legacy applications. It will be used to analyze register contents at different
program locations to reverse engineer program semantics for branch conditions as
much as possible as it will be shown in Chapter . As the step relies on fundamental
data flow analysis techniques the most important ones related to the work in this
thesis will be introduced in this section.
A data flow problem is a quadruple (G,H,L, T) with G being the intra or inter-
procedural control flow graph, H a label function, L a lattice and T a transfer
function. The lattice L models the data flow information. Its elements are called
data flow facts. The lattice contains all possible facts for a program which holds for
any possible paths at any program point. The information may be as simple as, e.g.,
availability of expressions at a program point or may contain information on the
types of variables. The transfer function T models the semantics of the program. It
modifies data flow facts with respect to a program fragment that would be executed.
The combination of T and L is called a data flow framework. The label function H
connects a specific program with a transfer function. It, thus, assigns a transfer
function to each node of the control flow graph G.
Although data flow equations can generally be defined on any kind of data flow
framework, this work uses a bit vector data flow framework on register level of
the application, as the analyzed instructions are machine level instructions of the
applications object code.
 basics
The following definitions are based on the definitions and equations found inside the
literature [ASU, FL, CSF].
Definition .. (Register Definition):
A register is defined if the content of the register is modified by an instruction. If a
definition d in basic block Bi is the last definition of d in Bi we call the definition
locally available.
Definition .. (Register Usage):
A register is used if the register is referenced by an instruction.
Definition .. (Definition Range):
A definition d of a register x in basic block Bi reaches basic block Bj if
. d is a locally available definition from Bi
. ∃Bi → Bj∀Bk ∈ (Bi → Bj) : k 6= i∧ k 6= j∧Bk does not redefine x
Definition .. (Killed Register):
A definition of a register in a basic block Bi kills all definitions of the same register
that reach Bi.
Definition .. (Dead Register):
A definition d of a register in a basic block Bi is dead if d is not used before being
redefined along all paths from Bi.
Definition .. (Upward Exposed Use):
A use u of a register x is upwards exposed in a basic block Bi if either:
• u is used in Bi and has not been previously defined in Bi
• ∃Bi → Bk with u being locally exposed from Bk∧ 6 ∃Bj, i 6 j < k which
contains a definition of x.
Definition .. (Live Definition):
A definition d is said to be live at basic block Bi if d reaches Bi and there is an
upward exposed use of d at Bi.
Definition ..:
The following sets will be used:
• def(Bi) is the set of register definitions locally available in block Bi
• in(Bi) is the set of register definitions that reach block Bi
. data flow analysis 
• kill(Bi) is the set of register definitions that are killed in block Bi
Given these definitions we are now able to define a typical data flow equation that
needs to be solved in the process of a data flow analysis.
out(Bi) = def(Bi)∪ (in(Bi) \ kill(Bi))
The equation above describes the information that is available at the end of block
Bi. This may either be the information that is directly generated or defined inside
the block or the information that reaches the block and is not killed. The set in(Bi)
may be described as an equation as well:
in(Bi) =
⋃
n∈pred(Bi)
out(Bi)
The equation given above is classified as the any paths problem as it combines the
information taking any path to the block. One may as well describe the in set
using the all paths problem which then describes the input of a basic block as the
intersecting set of all information taking any path to the block. The two problems
are also said to be forward-flow as the output is determined based on the input.
It is also possible to solve the equations in the reverse direction which is called a
backward-flow ; based on the output values the input values are calculated. In general
the data flow analysis equations can be classified by four sets which are shown in
Table  (taken from [FL]).
Forward-Flow Backward-Flow
Any
path
out(Bi) = def(Bi) ∪
(in(Bi) \ kill(Bi))
in(Bi) =
⋃
n∈pred(Bi) out(Bi)
in(Bi) = def(Bi) ∪
(out(Bi) \ kill(Bi))
out(Bi) =
⋃
n∈succ(Bi) in(Bi)
All
path
out(Bi) = def(Bi) ∪
(in(Bi) \ kill(Bi))
in(Bi) =
⋂
n∈pred(Bi) out(Bi)
in(Bi) = def(Bi) ∪
(out(Bi) \ kill(Bi))
out(Bi) =
⋂
n∈succ(Bi) in(Bi)
Table : Data Flow Analysis Equations
 basics
Solving Data Flow Equations
Data flow equation problems may further be classified into intra- and inter-procedural
data flow problems. The former ones solve the equations only inside one subroutine
of the program without taking into account values of other subroutines. The lat-
ter one incorporates the solutions of calling/called subroutines into the process of
solving the equations for one subroutine. Inter-procedural data flow analysis may
further be categorized by the following characteristics:
• Flow-sensitivity : an inter-procedural analysis is called flow-sensitive if the con-
trol flow of the caller is considered when analyzing the called function (e.g. the
intra-procedural data flow analysis results are propagated from the caller to
the callee). Flow-sensitive data flow analysis is more precise than the flow-
insensitive counter part.
• Context-sensitivity : in contrast to context insensitive analysis the context sen-
sitive data flow analysis considers the calling context (e.g. represented by the
calling stack) whenever a function call target is analyzed. This allows for a
higher precision as the analysis information generated for this call incorporates
the calling context and can be propagated to the calling statement.
In order to calculate the solution of a system of data flow equations an iterative
approach exists. It is based on iteratively recomputing the data flow equation sets
until a fixed-point is reached. This may, e.g., be done by the work-list approach
which iteratively solves the equations for all blocks inside a work-list of basic blocks.
A possible starting set for the work-list approach may simply be the list of all basic
blocks. Successors of a basic block, for which any of the sets in or out did not reach
a fixed point yet, are re-inserted into the work-list. The iteration ends if no block is
left inside the list.
.. Data Flow Problems
This section describes two data flow problems that are parts of the binary analysis
and the component extraction step of the approach. The copy propagation analysis,
described in the next section, will be used as a technique to resolve indirect return
statements inside the binary code. The ud/du-chains described afterwards are used
for the component extraction steps.
. data flow analysis 
Copy Propagation
Copy Propagation is a process used by compilers for optimization of programs. The
goal is the "replacement of all occurrences of targets of direct assignments with their
values" [ASU]. It is typically incorporated into a forward flow, all path data flow
analysis. Lets consider the first line in Table .
Instruction New Instruction Copy Table
: y=x y=x (y, x)
: z=y z=x ((y, x, ), (z, x))
Table : Copy propagation example.
The assignment of x to y can be propagated to the the uses of y on all paths if y is not
redefined anywhere on the path. Thus, inside the example table the last line may be
replaced with z = x. The condition for copy propagation is usually checked by using
ud-chains which are explained inside the following section. The copy propagation can
also be done on registers for low level programming languages instead of variables.
We call this form of copy propagation Register Copy Propagation. Similar to most
data flow analysis algorithms copy propagation can be done inside one function only
or across function boundaries. In the former case it is called global copy propagation,
interprocedural copy propagation in the latter case.
UD/DU-chains
A define-use chain (du-chain) is a data structure which contains for every definition
of a register the uses of the register that can be reached from the definition. Its
counterpart is the use-define chain (ud-chain) which contains for every used register
all definitions of the register that reach the use. ud/du-chains are built from solving
the reaching definitions data flow problem. The reaching definitions problem can
be solved by a forward flow, any path data flow analysis on the powerset of all
definitions inside the program.
Listing  demonstrates the du/ud-chain data structure for a small example. The
second parameter of the data structure is the instruction (here referenced as line
number).
 basics
 r3 = r4 * 2; // du(r3,1) = {3,4}
 r4 = r4 + 1; // du(r4,2) = {}
 r4 = 4 * r3; // du(r4,3) = {4} ud(r3,3) = {1}
 r3 = r3 * r4; // ud(r3,4) = {1} ud(r4,4) = {3}
Listing : DU/UD chain example
Building the chains can be very beneficial for program optimization. For example,
the define-use chains can be used for dead-register elimination. In the example line
two could be removed as the definition of register r4 in that line is never used. This
can be easily seen as the du-chain for this definition is empty. Inside this thesis
ud/du-chains will be used as a criteria for the reconstruction of program semantics
in Section ..
. contexts
Context-sensitive analysis allows for a higher precision, while analyzing across proce-
dure boundaries, as the calling context is incorporated into the analysis. During the
analysis the path that has been taken to get to a specific method or basic block is
stored and the analysis is based upon this specific path. Each basic block can, thus,
be assigned to a set of contexts that represent the possible paths in order to get to
this basic block. The most common way of representing this context information is
the call strings approach [SP].
Definition .. (Call Strings):
The set of call strings S is defined as S ⊆ E∗.
A call string represents a sequence of edges taken during the program execution.
A context will be an element ω ∈ S. In the presence of loops or recursion a call
string may have infinite length. In order to avoid this most approaches use mapping
functions which, e.g., limit a call string to a certain size k. These so called k-length
call strings just consider the last k edges taken, thus limiting the amount of possi-
ble contexts. A call string of length 0, e.g., corresponds to an non-interprocedural
analysis.
In order to combine call strings with edges during a path traversal we define a
connection function ⊕ :
. contexts 
Definition .. (Context Connector):
The connection function ⊕ is a function
⊕ : E∗ × E→ E∗
which connects two call strings in the following way:
⊕ e1 = (e1)
(e1, e2, .., en)⊕ en+1 = (e1, e2, .., en, en+1)
Using the call strings approach we may now define context sensitive graphs.
.. Graphs with Context
Contexts can be used to define context sensitive graphs, which enrich the representa-
tion of, e.g., a CG with context information. Similar to [The] we derive the context
sensitive graph Gc from its context insensitive representation G in the following way.
Definition .. (Context-sensitive Graph):
We recursively define the context-sensitive graph Gc = (Nc,Ec) with Nc ⊆ N×
S,Ec ⊆ Nc ×Nc derived from the graph G = (N,E) with the set of start nodes
Nstart ⊆ N representing the program entry points:
• All start nodes are part of the context-sensitive graph with the empty context:
∀n ∈ Nstart ⇒ (n, ) ∈ Nc
• Edges between nodes change the context according to ⊕:
∀(n,ω) ∈ Nc, (n,n ′) ∈ E⇒
(n ′,ω⊕ (n,n ′)) ∈ Nc, ((n,ω), (n ′,ω⊕ (n,n ′))) ∈ Ec
.. Example
As an example for a context sensitive graph lets consider the call graph seen in
Figure  a). After recursive expansion of the graph G the context-sensitive call
 basics
a) b)
Figure : Demonstration of the context sensitive graph construction: a) a context-
insensitive call graph G, b) the corresponding call graph with context Gc
graph Gc is depicted in b). The node f3 is represented two times inside Gc with
different contexts as it can be reached over the functions f1 or f2. This information
is encoded inside the call strings (e1, e2) and (e2, e4) respectively. Sometimes the
call strings just contain the function names instead of the corresponding edges for
simplicity.
Recursions inside the call graph would lead to an infinite amount of nodes and
edges. However, using the k-length call strings leads to an unrolling of the first
k iterations of the recursion. All following iterations would be abstracted inside
one node resulting in a finite amount of nodes. The abstraction results in a loss of
precision but ensures the termination of data flow analysis algorithms on the context
sensitive graph.
. summary
This chapter introduced the basic notations and concepts used by the approach
described in the rest of this work. The first part introduced some basics about
object files and the information stored in ELF object files. This information will
be important for binary decoding, which is explained in Chapter . The second
part of this chapter focused on program analysis in general. The control flow graph
was introduced and an overview of basic concepts and notations used by data flow
analysis was given. The techniques introduced will be used for the binary analysis
of the legacy code throughout the rest of this thesis. Precisely, Chapter  describes
the generation of the control flow graph, which is used inside the data flow analysis
. summary 
described in Chapter . For the purpose of extracting components out of the control
flow graph the data flow analysis techniques are used inside Chapter .

3
RELATED WORK
This chapter focuses on the related work, which is divided into two categories. The
first category will cover binary analysis related approaches which use related tech-
niques used inside the context of this work. The second part covers reconfigurable
systems in general and will introduce state of the art reconfigurable operating sys-
tems.
. binary analysis approaches
In the context of this work binary analysis related techniques are used. One of the in-
tended uses of the approach presented here is the reduction of the run-time memory
requirements of an application while maintaining its functionality. Typically, solu-
tions to this problem can be set into the context of post link-time optimization. Post
link-time optimization is a very comprehensive research area and may be classified
by static and dynamic approaches.
.. Link-Time Optimization
Many static program optimization approaches supported by modern linker are com-
bined under the name link time optimization. They enhance traditional compiler op-
timizations by using inter-procedural data flow analysis techniques to derive global
application information at or post link-time of an application in order to apply inter-
procedural optimizations to increase the performance and/or decrease the footprint
of an application.
Static Optimization Approaches
Post link-time approaches exist for nearly all architectures. Optimizations for the
ARM platform have been described by De Sutter et.al in [DSVPC+, DBDSVP+].
They use live register analysis, constant propagation and copy propagation data flow

 related work
analysis as well as control flow analysis techniques to eliminate dead-code or per-
form loop unrolling, thus, increasing the performance and decreasing the footprint
of the application. Schwarz et. al. and Luk et. al describe similar optimizations for
the Intel architecture in [SDAL, kLMP+]. Generally speaking a set of tools is
available which offers link time optimization for many platforms. One of them is
the Diablo Link Time Optimizer [PCB+] which supports link time optimization
for the ARM, i and PowerPC architectures. Also the widely used open source
GNU Compiler Collection (GCC) [Pro] supports link time optimization inside the
latest release.
Some program optimization techniques may be used by dynamic and/or static op-
timizers. This includes the optimization of loops by loop unrolling or improved in-
struction scheduling of often used execution paths in order to reduce pipeline stalls
[SDJ]. Trace Scheduling is another optimization technique used for Very Long
Instruction Word (VLIW) processors in order to increase the Instruction Level Par-
allelism. For example the creation of superblocks [mWHMC+] or hyper-blocks
[Aug] may be applied in this context. In order to perform these global optimiza-
tions profile information of the application is often used. By profiling the execution
of the application an optimizer may further improve the runtime performance of an
application by aggressively optimizing heavily used program parts. The profile in-
formation may either be given as a static input to the compiler or may be generated
and used at run-time by Just-In-Time (JIT) compilers.
Dynamic Optimization Approaches
Dynamic optimization approaches can most often be found in dynamic binary trans-
lation tools. Most JIT compilers as, e.g., the java virtual machine use dynamic profile
based optimizations. They use binary analysis techniques to increase the perfor-
mance of an application based on run-time information gathered during execution
of the program. At run-time the compiler may then decide to recompile parts of an
application to create code-specialization as described in [SYK+, BCF+, CLS,
IKY+, YMP+]. Even if implementations of JIT compilers exist for resource con-
straint devices, they are currently only used on specialized embedded systems as,
e.g., Java Smart Cards, which usually offer enough memory. Existing implemen-
tations [GPF, BTS] range between  kilobytes and multiple megabytes in
size. However, some embedded SOCs do not offer enough resources for these sys-
tems. Smaller implementations of the java virtual machine may be usable for those
systems. However, it is unlikely that they are going to be developed soon.
. binary analysis approaches 
Several approaches make use of runtime code generation to generate optimized code
dynamically. The approach of [LL] tries to exploit runtime constants to dynam-
ically perform loop unrolling, Rhiger [Rhi] uses partial evaluation. However, for
embedded systems this is less suited as additional runtime code generation overhead
is undesired.
While all of these optimization approaches can be listed as related work in the
scope of this thesis, as they all try to optimize a specific program for a specific
optimization parameter (e.g., binary footprint), the approach of this thesis cannot
directly be placed into the field of program optimization. It, however, has to cope
with the same kind of problems post link-time optimizer have to cope with as well.
This is the process of analyzing a given binary program to derive information for
optimization.
.. Problems solved by Binary Analysis
Static binary analysis has been recognized by researchers and industry as a very
promising technique for software quality assurance [Wag, LR]. Some approaches
for example use binary analysis techniques to ensure security rules before or during
program execution [BDD+, BDEK]. The authors of [KRV] use binary analysis
techniques to detect kernel-level malware, the authors of [CvdB] use static analysis
for downloaded programs to ensure the detection of malware.
Research has also evolved around analyzing binaries for conformance testing. The
authors of [VG] propose the use of binary analysis techniques to check whether
third-party libraries conform to a specified coding standard. It makes use of data
flow analysis on machine code level to detect non-compliant usage of pointers or
arrays. Kinder at. el [KV] use control flow and data flow analysis to statically
check untrusted driver binaries for conformance to specifications of the API used.
Theiling [The, The] uses interprocedural context sensitive control flow analysis
to calculate the worst case execution time of a given binary program. His tool
execctrl also has to cope with binary analysis problems as, e.g., the reconstruction
of a control flow graph of a binary program, which needs to be solved by the approach
inside this thesis as well.
Cifuentes et. al built a decompiler called dcc [CSF, CEa], which is able to gen-
erate C source code out of simple binary programs. The approach uses control flow
analysis techniques to detect loop types, data flow analysis to detect compiler idioms
 related work
and function calls. Using copy propagation principles they describe a way of gen-
erating high level expressions out of assembler instructions. For this purpose they
introduced the Semantic Specification Language (SSL) as an platform independent
abstraction of an ISA. It allows the specification of architecture dependent instruc-
tions in an unambiguous way, which can be used for automatic processing. The
principle of this language forms a basis for some parts of the approach described in
Section ..
. program analysis problems
All of the program analysis tools, which need to reconstruct the control flow graph
of a binary application, have to cope with a series of problems that need to be
solved by program optimization tools in general. This is also true for the approach
described inside this thesis. This section will introduce each of them.
Analyzing and modifying binary code is fraught with problems. For some of these
problems solutions exist, others are in general unsolvable. In this section an overview
of some of the most important theoretical and practical problems will be given.
.. Code Discovery
One of the major problems that needs to be solved is the so called Code Discovery
Problem. It refers to the problem of distinguishing between executable instructions
and data inside a binary program. Without any information on the binary program,
this problem is equivalent to solving the halting problem as it is unknown whether
an instruction will be executed or not. The problem arises whenever a data word
inside a binary code block may be misinterpreted as an instruction. Figure  shows
some code for which two versions of interpretation for the ARM THUMB ISA exist
(see Appendix A.).
       ...
0x3001 ; add r0,#1
0x4718 ; bx  r3
0x3002 ; add r0,#2
       ...
0x3001 ; add r0,#1
0x4718 ; bx  r3
0x3002 ; .word 0x3002
vs.
Figure : Illustration of the Code Discovery Problem. The word at address 0x3002 may
either be interpreted as an instruction or a data word.
. program analysis problems 
The data word 0x3002, following the data word 0x4718, may either be an instruc-
tion that is executed by the program at runtime or a data word that is used for
computations. The problem arises whenever a word value is contained inside the
set of all executable instruction opcodes. It is not known whether the word will be
used as data or an instruction or even both until the value is fetched from memory
and used by the processor as an instruction or as data. Even on architectures that
support segmentation of data and instruction, data may still be contained inside an
executable segment by the use of, e.g., jump tables. This is often used by compilers
to efficiently execute, e.g., case statements.
Although the halting problem is recursively unsolvable [Dav], heuristics have been
proposed which show a good instruction detection rate in real world applications.
However, no general solution exists to this problem, which makes this problem a
major blocking factor for methods that rely on the complete and correct detection of
instruction and data segments. In general, binary transformation tools need to rely
on some additional information provided to ensure a safe program transformation.
.. Self modifying code
Another problem that arises when trying to statically analyze and interpret binary
code is self modifying code. The term refers to instructions that are modified during
the execution of the application. Whenever the architecture allows the executable
memory region to be modified at runtime, an application may modify memory loca-
tions to create new instructions that are not known statically. Initially this method
had been used on computers that did not have much memory. By reusing data as
instructions and copying them to dedicated program parts as needed it was possible
to save memory.
       ...
0x20:  0x4679 ; mov r1,pc        // r1 contains address 0x24
0x22:  0x6008 ; str r0, [r1,#0]  // r0 is stored at address 0x24
0x24:  0x46c0 ; nop              // nop instruction is replaced
       ...
Figure : An example of self modifying code inside the ARM THUMB ISA. Address
x is modified by the instruction at address x.
Nowadays self-modifying code is mostly used for other purposes. One purpose is the
obfuscation of program code as used by encryption algorithms or malware applica-
tions. Another source of self-modifying code is given by programs as, e.g., binary
 related work
translators that dynamically translate platform independent code into executable
instructions of the execution platform. An example for self-modifying code for the
ARM THUMB ISA is given in Figure . At runtime the nop instruction at memory
address 0x24 is replaced by the preceding storage instruction with the content of
register r0. Statically deriving the behavior of the program at this memory location
may be impossible.
.. Indirect Control Flow Target Detection
Another problem exists if the control flow of a program needs to be known precisely.
Branch instructions may generally be classified by two sets of instruction types. On
the one hand branch instructions may contain a branch offset encoded inside the
instruction itself. The destination of such branches is either known directly or at
link time of the binary object. On the other hand instructions may use the content
of a register as the branch destination. In the former case control flow is called to
be direct. In the latter one the control flow is called to be indirect.
Most of the indirect control flows are due to jump tables that are generated by the
compiler to speed up switch/case statements. The targets of these branches can be
computed with high precision as it was shown in [CEb]. The return statement of a
method is also implemented by the compiler using indirect control flow statements.
As long as the callers of a method are known the destination of these kind of indirect
control flows can easily be computed.
More problematic sources of indirect control flows are method pointers, available
in most high-level languages, e.g., to implement inheritance or to allow dynamic
program behavior. The targets of this kind of indirect control flows are very hard to
compute and currently no approach exists which can guarantee the precise detection
of all targets.
.. Detecting Idioms
Sometimes a series of instructions has a specific semantic that can be important for
the analysis of binary code. These sequences may represent different kinds of high
level instructions. In this thesis the definition of [Cif] for an idiom will be used:
Definition .. (Idiom):
An idiom is a sequence of instructions that has a logical meaning which cannot be
derived from the individual instructions.
. tools 
Most idioms are known as they, e.g., reflect the calling convention of the underly-
ing Application Binary Interface (ABI). Others describe high level operations, like
mathematical operations on data types longer than the hardware register size. A
nice overview of a small list of these kinds of idioms for the x architecture is
given in [Cif]. However, this list is not complete and additional important idioms
are introduced in this thesis to allow the use of high level data structures for the
reconfiguration process.
. tools
A series of tools have been developed, which allow the monitoring, debugging and
verification of binaries. Jakstab [KV, KZV] provides a framework for the x
instruction set and allows the static analysis of binaries for program verification.
In the same context the program IDAPro [IDA] may be listed. Both support ap-
proximative construction of control and call graph of applications and support the
interception of program flow. Fundamental decoding techniques used by both tools
are also used by the framework proposed in this thesis. Chipounov et al. give an
overview of the tools currently available in [CC]. However, most tools require the
source code of the application and may not be used for analyzing legacy code.
Another tool related to the context of this thesis is the Pin [kLCM+] tool. Pin is a
tool, which allows the insertion of instrumentation code inside a running application
using local application debugging mechanisms. Using this tool it is possible to insert
code during the execution of an application for various purposes. The idea of adding
instrumentation code can be extended to reloading reconfigurable components into
an application as it is proposed inside this thesis. The instrumentation code itself
could then contain the instructions of the component that is loaded. However, the
tool only supports desktop applications for Linux and Windows machines and allows
a user to add some code to applications running on the same machine. It is not de-
signed for adding instrumentation code on remote platforms as resource constrained
embedded systems usually function. Anyway, the addition of instrumentation code
is a fundamental concept used by the approach proposed in this thesis.
.. Binary Decoding
Before any of the existing tools can analyze program properties or modify the ap-
plication the binary code inside the object files needs to be decoded into a sequence
of assembler instructions. This process is called binary decoding.
 related work
ELF header
...
.text
.symtab
.strtab
...
0:  47 70
2:  46 c0
4:  b5 f0
6:  b0 85
8:  91 00
a:  93 02
c:  ab 0a
e:  78 1b
10: 43 00
...
0:  <ip6_init>
0:  bx    lr
2:  nop
4:  <ip6_out>
4:  push {r4,r5,r6,lr}
6:  sub  sp, #20
8:  str  r1, [sp, #0]
a:  str  r3, [sp, #8]
c:  add  r3, sp, #40
e:  ldrb r3, [r3,#0]
10: bl   <ip6_send>
...
addr |  name 
0    | ip6_init
4    | ip6_out
sequential 
decoding
symbol 
annotation
.rel.text addr| name   |  class10  |ip6_send|R_ARM_THM_CALL relocation
symbol
annotation
Figure : Top-down disassembly of binary object code with symbol annotation.
The typical approach is the so called sweep algorithm depicted in Figure . Pro-
grams like binutil [GNU] belong to this category of binary decoding tools. It
sequentially parses binary words and disassembles them into the corresponding as-
sembler instruction starting with the first word of the binary code. The binary code
is taken from the .text section of the ELF object file. The result of the process can
be seen on the right hand side of Figure . Together with the symbol table of
the object file it is possible to annotate the assembler code with the correspond-
ing symbols listed inside the object file. This allows the detection of, e.g., function
boundaries. Not all of the functions need to be visible this way as the object file
might have been stripped of all symbols that are not globally visible. Third-party
libraries usually tend to be like this, as companies do not want any information to
be available which is not needed to link an object file, in order to protect their Intel-
lectual Property. However, all functions declared as global and thus all API functions
of a library can be detected using the symbol table.
Additionally, the relocation information stored inside the relocation table of the
object file will be used to identify the symbol names of external function calls
(functions that are not defined in the same object file). These function calls are
resolved by the linker at link time and replaced by the address of the symbol inside
the global application memory space. The type information stored inside the table
allows the linker to identify the ISA encoding of the function call.
. tools 
A major problem sweep-based approaches suffer from is the code discovery problem
as described in Section ... Sequential decoding of instructions does not safely
distinguish between code and data, even if the function boundaries are known. Data
words are very often stored in between executable instructions as, e.g., constants
or long-jump target addresses that are loaded into registers at run-time. However,
some architectures as the ARM architecture define ABIs which force compilers to
define special symbol table entries that can be used to safely distinguish between
code and data words inside the executable (.text) area of an object file. All ARM
ELF specification compatible ELF object files need to contain the following symbols
(see section .. of [ARMe]):
• Whenever a sequence of ARM (bit) instructions starts the symbol table
must contain the symbol $a at the corresponding address.
• Whenever a sequence of THUMB ( bit) instructions starts the symbol table
must contain the symbol $t at the corresponding address.
• Whenever a sequence of data words starts the symbol table must contain the
symbol $d at the corresponding address.
The incorporation of this information makes a sweep based algorithm, as used inside
binutil, feasible for the use inside this reconfiguration framework. However, indirect
jump target detection still remains a problem.
Most other architectures do not provide this kind of information inside the ABI.
In order to tackle the code discovery problem for those architectures as well as
the discovery of indirect jump targets, recursive traversal based approaches have
been proposed. As an example the tools execctrl by AbsInt [The, The], dcc
by Cifuentes et al. [CSF, CEa], IDAPro [IDA] or the approach described in
[FMPS] may be given. The basic principle is based on decoding the instructions
while following the control flow of the application. The decoding sequence starts
at the entry points of the application as the words at the entry points must be
instructions. The approach then sequentially decodes all words until a branch is
detected. All targets of the branch define new starting points for the decoding
sequence.
The problem of indirect jump target detection has been tackled inside the research
community using different ways. execctrl relies on special coding conventions used
by compilers and incorporates this information into the analysis. Cifuentes et al.
 related work
use pattern matching to identify code patterns for indirect jumps as, e.g., switch
statements. IDAPro also uses compiler patterns to identify these kind of patterns.
Another set of analysis approaches uses static analysis instead of compiler patterns
to resolve indirect jumps [FMPS].
The CFG which can be derived using these tools, either by using a classical sweep
algorithm or state of the art tools, is in general unsafe as the precise detection of all
jump targets cannot be guaranteed without restrictions. Code transformations based
on these graphs, thus, may not be possible. In general it is not possible to precisely
calculate all branch targets under all conditions. However, over-approximations exist,
which will be used inside this thesis to ensure a safe representation of the binary
code.
. reconfigurable/adaptable systems
A huge amount of reconfigurable or adaptable systems are used in resource con-
strained systems. Especially in the field of Wireless Sensor Networks reconfiguration
is of major importance for, e.g., maintenance purposes. Reconfiguration is achieved
by different methods. They can be classified into the following categories: full binary
upgrades, modular binary upgrades and virtual machine based systems.
TinyOS [Lev] is a popular operating system for highly resource constrained sys-
tems. It uses a monolithic binary image of the complete application which can be
combined with a network bootloader to perform full image upgrades of the system.
Even small changes in the functionality of the application results in a transfer of
the complete binary image (often between  and  KB). Some optimizations allow
the sending of a diff image, consisting only of the changes between the two images,
which results in a shorter update procedure. Keller. et. al proposed in [KH] to
create so called "delta files", which contain the byte streams of the adaptations
to be made on binary level. However, the delta files are created by compiling the
adaptations from source code for the different kinds of configurations.
Modular binary upgrade systems comprise mainly of a run-time loader and linker.
The loader is responsible for allocating appropriate resources for new modules in
the system to execute. The linker is responsible for resolving all references or depen-
dencies between the modules and other components in the system. SOS [HKS+]
and Contiki [DFEV] operating systems allow modular binary upgrades at run-
time. Contiki even supports linking of modules given in a compact ELF binary
. reconfigurable/adaptable systems 
format. However, linking and loading ELF files on the node increases the reconfig-
uration overhead significantly making this approach unsuitable for highly resource
constrained systems as, e.g., smart cards. However, the diff-based optimizations, as
suggested for full image upgrades, may also be applicable to the modular binary
upgrades in order to decrease the overhead.
A virtual machine based system offering reconfiguration for resource constrained
devices has been introduced by the MATE VM [LC] running on top of TinyOS.
Virtual machines like MATE or Tapper [XLC] allow for the implementation of
application logic by a powerful script language which is interpreted at runtime.
The virtual machines capabilities are fixed, applications however may be exchanged.
DAVIM [MHJV] enhanced this restriction by allowing the virtual machine to be
reconfigured at run-time. VM* [KP] is another virtual machine implementation
which interprets JAVA bytecode. VM* compacts the class representation and auto-
matically synthesizes a virtual machine that natively implements some of the system
classes. One of the problems in using these virtual machine based concepts in the
context of this thesis is the dependency on the source code of the applications to be
run.
The reconfigurable systems listed above may be used with some modifications to
reduce the memory requirements of an application by reconfiguration. However, non
of these systems completely considers the support of legacy binary code. If the source
code is available the existing solutions offer a powerful method to add reconfiguration
support to embedded systems. The concept of transferring delta files is similar to
the concepts proposed in this thesis, with the difference of delta files being forced
to specified static memory places. Additionally, the existing solutions produce delta
files from the source code of the application. The problem of producing delta files
from binary code are not solved and requires an approach as proposed inside this
thesis. Additionally, modular upgrade systems assume all dependencies to be solved
if an upgrade is done.
.. Structural Reconfiguration Mechanisms
In addition to the categorization of the reconfiguration approaches described above,
the applied mechanism to allow structural changes differ as well. They can be cate-
gorized into the following categories: indirection mechanisms, relinking mechanisms
and design pattern based mechanisms. Table  gives an overview of the categories
and the techniques used by them.
 related work
Indirection
Mechanism
Description Reference
Function
pointer indirec-
tion
By changing a function pointer the execution
path may be changed at runtime.
[Fab]
Meta-object
protocol
Modification of program behavior by support-
ing introspection and intercession.
[BCA+]
[WS]
Debugging Using debugging mechanisms the execution of
a program is intercepted and modified.
[kLCM+]
Relinking
Mechanism
Description Reference
Code relinking Links between entities are redirected on ex-
change.
[HN]
[HKS+]
[DFEV]
Architectural
connectors
Architectural abstractions are used to mediate
communication between software modules. A
reconfiguration involves replacing the connec-
tor.
[JMMV]
[KM]
Design Pattern Description Reference
Proxy pattern Provides a placeholder for another object to
control access to it.
[SM]
[Kin]
Strategy pat-
tern
Encapsulates a family of algorithms and makes
them dynamically interchangeable.
[KRL+]
Decorator pat-
tern
Attaches additional responsibilities to an ob-
ject dynamically.
[TVJ+]
Table : Overview of different structural reconfiguration mechanisms based on [Jan].
Indirection Mechanisms introduce a software layer between software entities which
intercepts the control flow between software parts. This can be done by simple
pointer manipulation operations which allows the reconfiguration system to call
different functions at runtime. Another technique relies on using meta objects to
provide all reconfiguration related operations. These so called meta object protocols
are very often applied using the component based programming paradigm. The tech-
. conclusion 
nique is often used in virtual machine based systems. Recently the use of debugging
techniques has been used to introduce dynamic program behavior. By exploiting
the debugging mode of many processors it is possible to inject code or dynamically
change the control flow of a program.
Mechanisms based on relinking, in contrast, avoid the cost of indirection by adapt-
ing all references between entities after new code blocks have been loaded. Some
approaches listed above use linkers and loaders to allow new objects to be added to
the system at runtime. The technique, however, needs to keep track of all references
at runtime to be able to change them whenever a new entity will be loaded to the
system.
The last mechanisms for reconfiguration can be categorized as design pattern based
mechanisms. Using different sets of design patterns it is possible to achieve structural
reconfiguration. Systems, however, need to be designed upfront using the design
patterns.
. conclusion
Many approaches have been created to solve parts of the goals described in this
thesis. Link-time optimization approaches allow binary code to be optimized for
speed and memory requirements by the use of control flow and data flow analysis .
This is a valuable technique which already grants huge benefits for software programs
and is commonly used by state of the art compiler. However, it does not solve the
general problem if the execution space is still too small for an application to run on
an embedded device. It is also not intended as an approach which allows software
programs to be adapted at runtime.
Program analysis approaches in general have been used for many reasons. Most
of the efforts are concerned with analyzing source code, which is not available in
the scope of this thesis. Approaches which analyze binary code use the analysis
results for solving related kinds of problems. On the one hand, they are used for
link-time optimization. On the other hand, it is used to cope with security issues of
applications, quality assurance or compliance testing. Also the highly complicated
process of decompiling binary programs relies on data flow analysis techniques to
regain high level instructions from binary code. Common basic concepts will be
reused inside this thesis. However, they are used to extract information on the
binary objects which will enable run-time reconfiguration of binary objects.
 related work
Run-time reconfiguration approaches have been proposed for very small embedded
devices for varying goals. It has been shown to be indispensable for some kinds of
applications as it allows for re-tasking, fixing bugs, adding functionality or replacing
functionality due to memory restrictions. Full or diff-based binary upgrade systems
suffer from huge overheads. Linker and loader based systems are more fine granular.
However, the modules linked into the systems are not optimized in a way link-time
optimization allows for static systems. The link-time optimization step would have
to run on the node itself as all dependencies are only visible there. Linking and
loading on a node also introduces overheads. In [HF] a very small component
based architecture has been developed to run on small embedded devices; including
a run-time loader. It has been shown that run-time linking and loading can be done
efficiently with a small code overhead. The reconfiguration unit is, however, limited
to the object files used by the system. Essentially a reconfiguration component
is equal to the complete object file loaded inside these approaches. A selection
of specific parts of these object files is not supported by any system up to date.
Additionally, the reconfiguration architecture proposed by the approach in this thesis
introduces a smaller overhead to the system as the approaches listed above. This
will be shown inside the evaluation in Chapter .
Script-based or virtual machine-based systems are not fully suitable for the targeted
system as they all rely on the availability of the source code of an application to
re-implement them as a script or as bytecode. Program transformation would allow
the use of a virtual machine based reconfiguration system. However, the problems
that would need to be solved for legacy applications would be similar to the ones
described inside this thesis.
Reusing parts of legacy code objects as components inside reconfigurable systems
has not been completely tackled yet. Additionally, the design of the systems offering
reconfiguration do not optimize their components with respect to the worst case
execution time or other important embedded parameters. The approach in this
thesis fills this gap.
Part II
THE APPROACH

4
LEGACY CODE RECONF IGURATION
Within this chapter the legacy code reconfiguration approach will be introduced.
First the requirements of the approach are introduced, then the overall methodology
will be explained. The chapter concludes with a list of open problems that will be
solved amongst others in the rest of this thesis.
. requirements
The proposed approach allows the use of code from legacy libraries in a fine-granular
reconfiguration environment without the drawbacks of current state of the art ap-
proaches. Common approaches either do not allow legacy libraries to be used or
simply wrap the complete legacy library into one huge component. This, however, is
not optimal for highly resource constraint systems. Libraries are typically not given
by means of high level code, which might be rewritten and optimized for reconfig-
uration support. This is often due to legal restrictions. Third party code, although,
is very often shipped as object code together with high level header files, to allow
the seamless integration into software projects. Thus, a low level method which, on
the one hand, is able to extract components out of low-level libraries and, on the
other hand, allows to add reconfiguration support to them is needed. Forcing the
system developer to do this manually is highly undesirable as well as often imprac-
tical as the expert knowledge required to do this cannot be assumed to be available.
With this in mind the approach proposed in this thesis focuses on the following
requirements:
• Usability: The approach shall be as automatic as possible. Converting parts of
the legacy code into reconfigurable components shall be supported by a tool
the user can use to easily configure the system parameters.

 legacy code reconfiguration
Executable 
Reconfigurable 
Binary
Relocatable 
ConfigrationCompone tConstraints
Binary ObjectRelocatable
Binary Object Binary Analysis
Component 
Identification & 
Optimization
Binary Rewriting
Linker
Component Linking
Binary ObjectRewritten 
Object
generates
generates
Reconfiguration Framework
Re-
configuration 
Manager
loads components
on demand
Input
input to
components
Figure : Overview of the reconfiguration methodology.
• Run-Time Efficiency: component loading and replacement shall be as simple
as possible without any need of linking the components at run-time. The
execution overhead at runtime shall be kept as small as possible.
• Correctness: The semantics of the legacy code must not be changed.
. methodology
The overall methodology, that is based on the requirements listed in the previous
section, is depicted in Figure . Referring to Figure  in Chapter  the idea is to
seamlessly integrate the approach into the standard development tool flow inside the
linking process. Thus, the only input the approach uses is given by the relocatable
object files (or libraries) together with their high level header files, which describe
the interface of the objects. Instead of directly delivering these object files to the
linker, the files are modified by a series of steps, which will be described in the
following sections. The modified object files are then linked within the standard
tool flow and an executable reconfigurable binary is generated.
. consistency preservation 
In contrast to existing reconfiguration systems, in which components used by a
reconfigurable system are specified by the programmer in a high level language,
components first need to be created from the object code of the system itself. The
code usually is not written with reconfiguration support in mind. The approach
described in this thesis, thus, first has to extract components from the object files.
The standard tool flow is enriched by binary analysis, component identification,
binary rewriting and a component linking step; each of them will be explained in
detail in the following chapters. The basic idea is to use a binary analysis framework
to generate the CFG and the CG of the binary objects and enrich them with high
level information to allow the user to specify rules, which can be used to extract
components that can be used for the run-time reconfiguration process.
The original binary objects are automatically modified to contain reconfiguration
enabling routines. The modified object files are then reintroduced into the standard
linking flow to generate the executable reconfigurable binary. The executable is then
capable of loading components at runtime from a specified server location. This
might be a terminal connected to the embedded device using a realtime-capable
bus.
In order to avoid the necessity of linking the reloaded components at runtime, con-
trol flows between components and other parts of the system are replaced with a call
to the reconfiguration manager. The reconfiguration manager then controls the flow
to other components based on static offsets determined at link time of the system.
The approach can, thus, be categorized as an indirection mechanism based recon-
figuration using function pointer indirection techniques. The original object code,
which needed to be linked, will automatically use an indirection layer for control
flow passing between components. The use of indirection techniques in combina-
tion with off-line linking is well suited for the purpose of the reconfiguration as it
ensures deterministic and very short load times and a very small overhead of the
reconfiguration manager.
. consistency preservation
Referring to Goudarzi [MG], a dynamic software reconfiguration yields a correct
system if after completing the reconfiguration process:
. the system satisfies its structural integrity requirements,
 legacy code reconfiguration
. the entities in the system are in mutually consistent states and
. the application state invariants hold.
Based on these requirements the following subsections will discuss in which way the
requirements are fulfilled by the approach proposed in this thesis.
.. Integrity
The first requirement states that the structural integrity of the software system is
ensured while under reconfiguration. The structural integrity requirements define
in which way software components communicate and transfer control between each
other. This may be further categorized into the three requirements of maintaining
the referential integrity, the interface compatibility, and dependencies of software
entities.
The reconfiguration approach proposed in this thesis ensures referential integrity by
the use of a indirection scheme described in Chapter . References are detected prior
to the system deployment and replaced with an indirection layer. The exchange
of components at runtime only involves updating one single table entry for the
component loaded to ensure the referential integrity.
Interface compatibility is guaranteed as the code reconfigured at runtime is statically
available prior link time. The compatibility is, thus, already guaranteed by the
possibility to link the object code using traditional compilation tools. The addition
of completely new functionality after system deployment needs to solve the problem
of ensuring the interface compatibility at any program point on binary level. This,
although, exceeds the scope of this thesis. Anyway, this will be a very interesting
future research direction.
Dependencies between entities can be a problem for distributed systems. In the
context of this work dependency relates to possible control flows between software
entities on the same node. The dependencies between components are extracted by
the approach described in Chapter . Dependency requirements are automatically
resolved at runtime by triggering a reconfiguration if the dependent component is not
loaded. As memory efficiency is of utmost importance for this work the dependency
 as reconfiguration based on the approach in this thesis can happen theoretically at any program
point, not only on function call level.
. architecture restrictions of this thesis 
is solved on demand. However, dependencies are calculated off-line and used to
ensure the temporal consistency of the system.
.. Consistency
Additionally to the previous requirements the system under reconfiguration needs
to ensure the consistency of the software components. A reconfiguration must not
change the state of the system such that the system does progress towards an error
state. Most solutions to this requirement are based on freezing parts of the system
whenever a reconfiguration is triggered. This will also be a technique used by the
approach described here.
The reconfiguration used inside this thesis does not change the state of components,
whether loaded or unloaded. It is not inversive in such a way that component states
will be changed or made incompatible to the new component loaded as only the
binary code of a task is temporarily removed from the system. If inconsistencies
exist they are due to a faulty system design before reconfiguration has been applied.
The functional behavior of tasks cannot be changed by the reconfiguration of an-
other task. While the system will be kept in a mutually consistent state this way,
the timing, however, may be influenced. As the timing behavior of embedded sys-
tems is very important the approach ensures a deterministic timing behavior under
reconfiguration. The details are covered in Chapter .
.. State-Invariant
Preserving program state-invariants is a requirement concerned with the preserva-
tion of states over the complete system under reconfiguration. As an example lets
assume the replacement of a component which generates unique identifiers. The new
component must not produce the same identifier again in order to hold the state
invariant requirement. As the scope of this thesis is not concerned with replacing
components by new implementations, this topic will not be discussed any further.
. architecture restrictions of this thesis
As one part of the approach concerns the analysis of the binary objects, all of
the problems listed inside Section . need to be considered. Generally speaking
these problems make it difficult to use binary code inside optimization or code
transformation approaches. The list of problems may even be prolonged by "tricks"
 legacy code reconfiguration
that are used inside malicious programs as e.g. viruses. While the general case of
these problems remains unsolvable the use of certain restrictions on the application
and the hardware architecture allows binary code to be used inside the methodology
proposed in this thesis. The restrictions on the binary code is given as follows:
a. The binary code does not contain any malicious or self-modifying code.
b. The binary code conforms to some ABI that allows the distinct detection of
all instructions and data words.
c. The binary code is contained in a set of statically linkable object files which
are used as the input to the approach.
d. The API of the object files is given in a high level representation as, e.g.,
C-Header files.
The use of statically linkable objects as an input to the approach is important as the
additional information contained inside the object files allows the safe reconstruction
of the control flow graph of the application. This will be shown in the next chapter.
The restriction of using statically linkable object files together with a set its high
level API description is not limiting the applicability of the approach too much.
Software developer using third party implementations need at least this kind of
information to be able to link third party code into their binaries.
Although the implementation, evaluation and the examples in this thesis are based
on the ARM architecture, the approach is not limited to this architecture. As long
as the restrictions listed above are fulfilled the approach can be used on other archi-
tectures.
. open problems
The restrictions in Section . prohibit self-modifying code and code which makes
use of undetectable data/code mixing. Most applications found inside embedded
systems fulfill these restrictions. Self modifying code will rarely be found inside ap-
plications which are targeted by the approach in this thesis. Thus, they are excluded
from the analysis.
. open problems 
Additionally, applications for highly resource constrained systems like, e.g. smart-
cards, are often written for the ARM architecture, which uses an ABI that allows the
distinction of data and binary code, thus, already fulfilling the second restriction.
This will be shown in the next section. An open problem is still the identification of
idioms and the identification of indirect jump targets. The former problem will be
discussed in Chapter . Approximate solutions to the latter problem are described
in the next chapter.
After the problems of decoding an application are solved, a solution to the identifi-
cation of possible reconfiguration components from the binary objects needs to be
given. As the application is given as binary code traditional approaches to describe
components on source code level are not applicable. In addition to that, the object
code level of applications is not considered to be used by application developers
manually. It is preprocessed by several tools and does not offer support for editing
or changing parts of the application code in contrast to editing source code files.
Thus, the approach presented in this thesis proposes a process for automatic iden-
tification of program parts based on high level rules the user can easily define. The
concept for this is explained in Chapter .
The components, resulting from the approach in Chapter  will then need to be
optimized, as the system performance heavily depends on various parameters. How
this is done is described in Chapter . The last problem that needs to be solved is
the transformation of the original application into the system which allows reconfig-
uration at runtime. As this needs to be done on binary level the binary object files
need to be rewritten, which is covered in Chapter .

5
CONTROL FLOW RECONSTRUCTION
The fundamental data structure binary transformation tools operate on is, generally
speaking, the interprocedural control flow graph. This chapter will focus on the
problem, how this program representation can be re-constructed from the binary
application code, specifically in the context of the target architecture used for this
thesis.
The first part will cover the decoding of the binary code, followed by the construction
of the control flow graph. It will then describe the handling of indirect control flow
edges to ensure a safe program transformation. The main goal of the control flow
reconstruction phase of the approach is the generation of a safe program representa-
tion. A safe representation covers all possible control flows inside the program that
may happen at runtime. A safe representation does not need to be precise. It just
ensures that all references and control flows are represented. A safe representation
may be an over-approximation of the real control flows. A precise representation
covers exactly the real control flows and references. By the combination of several
well known approaches described in this chapter the framework developed in this
thesis is able to safely analyze and transform binary application code, while trying
to be as precise as possible.
After the binary decoding step the object code is available as annotated assembler
code as seen in Figure . The code is then processed in the next step, which
generates the interprocedural control flow graph representation of the application.
 In the context of this thesis the framework operates on a complete graph representation of the
binary code based on basic blocks, which may contain only partial procedure information. This
may be the case in stripped binary object files. However, it will further be called inter-procedural
control flow graph to conform to the standard naming conventions inside the literature.

 control flow reconstruction
Linear code 
analysis
Data Flow Analysis
Alias Analysis
Hell Node 
Insertion
Unresolved 
indirect 
branches?
Too unprecise 
branches left?
Figure : Activity diagram of the CFG generation process. A series of methods are used
to create a safe representation while trying to be as precise as possible.
. building the control flow graph
For the analysis of the binary object the framework needs a fitting graph represen-
tation of the application code. As parts of the binary code will be rewritten the
analysis must be correct and safe because any uncertainty may result in modifica-
tions that may violate the correctness requirement of the approach. A safe analysis
clearly marks any uncertainties that may occur during the analysis of the program.
The framework combines four methods to generate a safe control flow representation
of the application. Figure  depicts the activities involved in this process. Initially
a linear code analysis is performed which extracts all basic blocks and simple control
flows between them. Indirect control flows are not resolved in this step, only marked
as uncertainties. The graph is then processed by a data flow analysis step which
performs a copy propagation based algorithm to detect all function return pointers.
Left indirect branches are then approximated by using alias analysis tools. The
result may be a precise representation of the indirect branches. However, some
indirect branches may still be very imprecise, e.g., pointing to possibly every memory
address. These indirect control flows are finally over-approximated by the insertion
. building the control flow graph 
00000018 <udp_new>:
 18:   b510            push    {r4, lr}
 1a:   2000            movs    r0, #0
 1c:   f7ff fffe       bl      memp_malloc
 20:   1c04            adds    r4, r0, #0
 22:   2800            cmp     r0, #0
 24:   d007            beq.n   36 
 26:   223c            movs    r2, #60 
 28:   2100            movs    r1, #0
 2a:   f7ff fffe       bl      memset
 2e:   2301            movs    r3, #1
 30:   1da2            adds    r2, r4, #6
 32:   425b            negs    r3, r3
 34:   77d3            strb    r3, [r2, #31]
 36:   1c20            adds    r0, r4, #0
 38:   bc10            pop     {r4}
 3a:   bc02            pop     {r1}
 3c:   4708            bx      r1
Figure : A small example assembler function and its basic blocks decoded from its
binary object file.
of a so called hell node which allows for a safe representation of the control flow.
The following sections will describe each of these steps in the given order.
.. Interprocedural Control Flow Graph
The interprocedural control flow graph incorporates the two representations control
flow graph and call graph of a program. Lets consider the program given as the se-
quence of ARMv THUMB assembler instructions in Figure . By linearly parsing
the assembler instructions all basic blocks have been marked. The linear basic block
detection algorithm can directly be implemented following the definition of a basic
block:
. Every function start address defines the start of a basic block.
. Every branch instruction defines the end of a basic block.
. Every branch target defines the beginning of a basic block.
For the following algorithm let P be a program and I be its decoded instructions from
a process as described in Section ... Algorithm  demonstrates the generation
of the Interprocedural Control Flow Graph (ICFG) from the instructions of the
program under the restrictions of Section .. The main loop corresponds to the
linear instruction analysis applied by most approaches found in the literature. It
 control flow reconstruction
linearly parses all instructions (Line ) and generates new nodes/blocks whenever a
change in the control flow occurs. This can either be an unconditional or conditional
branch, which is handled by the check in Line . The set of nodes is extended by
the current block and a new node is generated for the following instruction. The
corresponding edges to the target of the branch and/or the following instruction is
generated as well. The target address is stored as the address marks the beginning
of a new basic block. If the target is inside a block we already encountered, the block
is split using the split_block method, which updates the blocks accordingly.
Whenever a target of a jump is encountered, upon linearly parsing the instructions,
a new basic block is created. This is handled by the Algorithm  in Line  et sqq.
Indirect jumps are handled by Line -. A new block is created for the following
instructions. However no edge is created as the target is unknown. The block is then
stored in a set of uncertain blocks which needs to be further processed. The last
type of instructions supported are exception raising instructions. This can be, e.g.,
a system call to the OS. The getHandler method returns the corresponding handler
block for this instruction and a corresponding edge is created.
After all instructions of the program are parsed the generated control flow graph
is analyzed in order to fix all uncertain jumps. The analyseCFG method in Line
 takes care of this. How this is done is explained in the following section. All
remaining uncertainties are afterwards replaced with a call to a "hell node" in Line
. This is explained in the last section of this chapter.
A CFG after running Algorithm  on an example program can be seen in Figure
. It already demonstrates two kinds of edges, which correspond to different kinds
of control flow forms that can occur inside a program. The dotted edges represent
conditional control flow. Conditional control flow is generated by a conditional jump
statements as seen inside the example program at address 24. Other sources of
conditional control flow may result from indexed jumps, which are used for, e.g.,
switch statements. The solid edges represent unconditional control flow including
procedure calls. Return edges from functions are not represented by edges inside
this representation.
The ICFG described here will be used throughout this thesis as this representation
offers a very convenient way for the analysis done inside the approach of this thesis.
The graph representation is context-insensitive as the call graph incorporated into
the ICFG is context-insensitive. Every procedure is represented only by one node
making some analyses less precise as it would be possible using a context-sensitive
. building the control flow graph 
Algorithm  Generate CFG
: procedure generateCFG(I) . Input: Instructions I of program P
Output: CFG cfg = (N,E), Set<BasicBlock> uncertainBlocks;
LocalVar: BasicBlock bcurrent,bnew; Instruction instr;
LocalVar: Set<Integer> jumpTargets; Set<BasicBlock> blocks;
: bcurrent = newBlock(0);
: for instr ∈ I = (i1, i2, .., in) do . Linearly parse instructions by address
: bcurrent.instructions ∪ instr;
: if instr.type == Branch then . Un-/conditional Branch
: jumpTargets = jumpTargets ∪ instr.target;
: N = N ∪ bcurrent;
: bnew = newBlock(instr.address+instr.size);
: if instr.type == ConditionalBranch then
: E = E ∪ (bcurrent,bnew);
: end if
: bjump = getBlock(instr.target)
: if bjump != null then
: split_block(jump,instr.target);
: E = E ∪ (bcurrent,getBlock(instr.target));
: end if
: bcurrent = bnew;
: else if instr.address ∈ jumpTargets then
: N = N ∪ bcurrent;
: bnew = newBlock(instr.address+instr.size);
: E = E ∪ (bcurrent,bnew);
: bcurrent = bnew;
: else if instr.type == IndirectJump then
: N = N ∪ bcurrent;
: uncertainBlocks ∪ bcurrent;
: bcurrent = newBlock(instr.address+instr.size);
: else if instr.type == Exception then
: N = N ∪ bcurrent;
: E = E ∪ (bcurrent,getHandler(instr.exceptionType));
: bcurrent = newBlock(instr.address+instr.size);
: end if
: end for
: N = N ∪ bcurrent;
: analyseCFG(cfg,uncertainBlocks); . Data Flow Analysis to fix uncertainties
: insertHellNodes(cfg,uncertainBlocks);
: return cfg, uncertainBlocks;
: end procedure
 control flow reconstruction
 18:   b510         push    {r4, lr}
 1a:   2000         movs    r0, #0
 1c:   f7ff fffe    bl      memp_malloc
 20:   1c04         adds    r4, r0, #0
 22:   2800         cmp     r0, #0
 24:   d007         beq.n   36 
 26:   223c         movs    r2, #60 
 28:   2100         movs    r1, #0
 2a:   f7ff fffe    bl      memset
 2e:   2301         movs    r3, #1
 30:   1da2         adds    r2, r4, #6
 32:   425b         negs    r3, r3
 34:   77d3         strb    r3, [r2, #31]
 36:   1c20         adds    r0, r4, #0
 38:   bc10         pop     {r4}
 3a:   bc02         pop     {r1}
 3c:   4708         bx      r1
call
call
udp_new
Figure : A part of the CFG of the example program of Figure  generated by Algorithm
.
representation. However, for the purpose of identifying reconfiguration components
the context-insensitive analysis implemented inside our framework suits best as
the identification process is context independent. The context sensitive version of
the ICFG will be used inside the optimization chapter for a precise analysis of the
reconfiguration overhead.
.. Basic Block Augmentation
Some architectures do contain specific assembler instructions that are only executed
if a specific condition flag is set. The ARM platform is an example for such an
architecture. Almost all ARM instructions can be executed conditionally, i.e., you
can specify that the instruction only executes if the condition code flags pass a given
condition or test. By using conditional execution performance and code density can
. building the control flow graph 
0x32
0x32
0x24: add     r0,r2,#24
0x28: cmp     r0,r1
0x32: movgt   r0,1
0x36: movle   r0,0
0x40: b       lr
0x36
0x36
0x40
0x40
0x24
0x28
0x24
0x40
Figure : Control Flow Augmentation of ARM conditional instruction execution.
be increased and the amount of pipeline stalls can be reduced. In general an ARM
instruction has the following pattern:
< operator >< condition >< flags >< operands >
A corresponding instruction with all fields used is
movgt.n r1, r0
The operator is mov, the condition is gt (greater), the instruction flag is .n and
the operands are r1 and r0. The semantics of this instruction can be summarized
as: move the contents of register r0 into register r1 only if the carry condition flag
is set (greater) and update the condition flags (.n).
The default condition field value is represented by the AL mnemonic, which stands
for always execute. Instructions with the condition field set allow the conditional
execution of it depending on the current condition flags set.
Figure  shows a sequence of instructions containing instructions with condition
field set. The instructions movgt is only executed if the carry condition flag is cur-
rently one. The instruction movle, however, is only executed if it is zero. As these
instructions do no directly change the control flow of the processor they are normally
contained inside a basic block. However, for data flow analyses it is interesting to dis-
tinguish between the possible instruction executions just like it would be a separate
execution path. For our analyses we thus augment the CFG with an additional control
flow layer as seen in Figure . The original node inside the CFG stays the same in
order to avoid any incompatibilities with the binary rewriting algorithms. However,
 THUMB instructions do not contain the condition field. Thus, the control flow augmentation is
only needed for ARM instructions.
 control flow reconstruction
the node is marked as an augmented node n and a CFG Gaug(n) = (Naug,Eaug)
is generated for the basic block, which treats changes between conditional execution
flags as conditional edges inside the graph.
All data flow analysis steps are then executed on the control flow graph contained
inside the augmented basic blocks.
.. Indirect Control Flow Target Resolution
Direct function calls and branches are easy to detect. Precise indirect control flow
detection is not as trivial. The framework simplifies the detection of indirect con-
trol flows by supporting two sources of indirect control flow precisely while over-
approximating the rest. The first type of indirect control flow, which the framework
detects, is intra-procedural indirect branches caused by switch statements. By de-
tecting compiler patterns as described in [CSF] it is possible to detect the indirect
control flow targets.
The second and most common type of indirect control flow for many architectures are
function returns. Some architectures as, e.g., the x architecture provide explicit
function return instructions. Functions can be called using the call instruction
and function returns are implemented using the ret instruction. The detection of
these control flows is straight forward. However, many architectures use indirect
register based jumps to implement the return behavior. The PowerPC and the ARM
architecture do not provide explicit instructions for function returns. The compiler
or the programmer manually needs to implement the return behavior using register
based jumps. The ABI of the architecture describes how these jumps need to be
implemented. Using this knowledge the resolution of these control flows can be
handled using a stack based copy propagation analysis. By treating the stack as
an additional set of registers in the context of the function (while keeping track of
the stack pointer), a copy propagation analysis can detect function returns. Table
 demonstrates this.
A typical function prologue and epilogue can be seen in Table  . The first instruction
stores the context of the calling function on the stack. Register r4 will be used
somewhere inside the method and is saved. The link register lr contains the return
address and is stored on the stack as well. The function epilogue consists of restoring
the context of the calling method and the indirect jump to the caller. The link
. building the control flow graph 
Instruction Replaced Instruc-
tion
Table
push {r4, lr} push {r4, lr} ((stack96, r4), (stack100, lr))
... ... ...
pop{r4} pop{r4} ((stack96, r4), (stack100, lr))
pop{r1} pop{r1} ((stack96, r4), (stack100, lr), (r1, lr))
bx r1 bx lr ((stack96, r4), (stack100, lr), (r1, lr))
Table : Detection of function return statements by copy propagation analysis.
register is loaded from the stack into a register and used for the indirect register
based jump. By doing a global copy propagation analysis, while handling the stack
as an additional register set, it is possible to precisely detect these return statements
as the jump to the link register can be identified safely as seen in the last instruction
of the table. The second column contains the replaced instruction, which contains
the instruction after replacing the registers with their values (in this case other
register names) if copy propagation would be applied.
Alias-Analysis
Indirect jump targets, which are not detected by the previous methods, still intro-
duce uncertainties. While these uncertainties can be over-approximated with the
method described in the next section, which allows a safe program transformation,
execution time related analysis algorithms still demand for the highest possible pre-
cision.
The approach of Liang Xu et al. solves this problem for a subset of applications.
They are able to calculate indirect jump targets under the restriction, that "the
target of an indirect branch is completely determined by a control flow path to
this indirect branch and is independent of intermediate program states" [XSS].
However, indirect jump targets are often dependent on program states as, e.g., the
use of function pointers often depends on some internal state of the program. Alias-
analysis can help to increase the precision of the target estimation.
 Not all registers may be used for the return statement as the calling context may be destroyed.
The ABI specifies a set of registers that can be used for this purpose which is typically excluded
from a method context.
 control flow reconstruction
Alias-analysis refers to the the data-flow problem of finding pointers or registers
which point to the same location. It may be divided into two distinct subproblems:
"() disambiguating pointers that point to objects on the stack, and () disambiguat-
ing pointers that point to the heap" [GH]. A considerable amount of work has
been done inside this area. Yet, most of the work is concerned with analyzing pro-
grams on source code level. The work of Brumley et al. [BN] first tackled the
alias-analysis problem on assembly level.
In general if the values of two expressions are equal (r1 ≡ r2 mod 232), then r1 and
r2 are called aliases. The goal is to find all these aliases for indirect register based
jumps, which allows the identification of all values the jump address may take. This
is usually done by pairing the contents of a register at a program location with all
expressions this register may take by the use of alias analysis. However, this problem
is well-known to be undecidable. Thus, most approaches compute all may-aliases,
which computes all pairs r1 and r2 which may be aliases. This is an conservative
approach; that is, if an alias relationship is possible, the approach includes it.
The approach of Brumley et al. [BN] uses abstract interpretation of the program
in order to compute all alias pairs. They define an abstract state machine and run
an interpreter on the abstract machine to detect all memory accesses and store their
values in pairs with the corresponding registers it was loaded to. The algorithm is
based on running all possible program paths, including loops, until the state of every
variable is saturated. Meanwhile, all references and register values are tracked and
stored to allow the identification of aliases. As the domain for every variable might
be quite large, the algorithm runtime is quite high for most programs. Limiting
the saturation of variables speeds up the process, however, precision is lost, which
results in some uncertainties left inside the control flow graph.
The remaining indirect branches still introduce uncertainties for program transfor-
mation. Thus, the next section describes the method used to safely handle them in
order to ensure the detection of all control flows for a safe program transformation.
.. Safe Over-approximation
Before any data flow analysis may be done on the ICFG of the application a safe
analysis must be guaranteed. Any occurrence of indirect control flow without know-
ing the possible branch targets may result in an invalid analysis or invalid program
transformation.
. building the control flow graph 
addr| name   |  class
10  |ip6_send|R_ARM_THM_CALL
24  |ex_func1|R_ARM_ABS32
addr| name   |  class
40  |ex_func2|R_ARM_ABS32
ex_func1
ex_func2target extraction
ELF header
...
.rel.text
....
.rel.rodata
relocation targets:
Figure : Detection of relocation targets for indirect control flow.
Indirect control flow, which is based on data inside the heap, is a major problem.
Such indirect control flows are due to, e.g., function pointers. Resolving the possible
branch targets can partially be handled by alias analysis [FE, DMW]. Most
of the approaches, however, work on the source code of applications. Those who
analyze binary code cannot guarantee the precise and safe detection of all aliases.
In general the precise flow-sensitive and also flow-insensitive alias analysis is NP-
hard [Hor]. Thus, another solution needs to be found to guarantee the safe analysis
of the application.
By using the relocation information provided by the object files we may safely
identify all possible jump targets using the approach proposed by B. De Sutter
[SBB+]. The general idea is based on the knowledge that any function, that may
be used as a target for an indirect control flow, needs to be listed inside the relocation
table of the corresponding object file. This is due to the fact that the absolute address
of the function is not known a priori and will be determined first by the linker. Every
relocation symbol contains a class information, which allows the linker to identify
the operation to be used to calculate the final value for the corresponding memory
location. All relocation symbols of class Data (a corresponding class notation for
ARM ELF Files is, e.g., R_ARM_ABS32 as seen in Figure ) correspond to symbols,
which are stored as a data word inside the section they are defined in. These kind of
symbols may be used as function pointers and thus may be a target of an indirect
branch. Figure  demonstrates the extraction of these branch targets from an ELF
object File.
All unknown indirect control flows inside the ICFG are replaced with a call edge to a
special node called Hell Node. The Hell Node marks unknown indirect control flows
inside the ICFG and is used to ensure a safe analysis. The Hell Node itself contains Hell Node
edges to all entry basic blocks of the relocation targets extracted from the object files.
The use of these targets may not be precise as not all of these relocation symbols
 control flow reconstruction
may be used as indirect branch targets. However, the overestimation ensures a safe
analysis of the application code as all actual program paths are covered.
. summary
This chapter focuses on the generation of the interprocedural control flow graph
of a decoded binary. The framework used inside this thesis incorporates multiple
approaches to be as precise as possible. The main problem is the detection of indirect
jump targets. The framework developed in this thesis integrates several well known
concepts to detect various types of indirect control flow in order to guarantee a safe
program representation.
However, some indirect jump targets can only be approximated. This is done by
using alias analysis, which allows the identification of memory and register aliases.
All remaining indirect jumps are safely handled by introducing a Hell Node, which
clearly marks uncertainties inside the control flow graph and over-approximates the
actual set of jump targets. This ensures a safe program analysis and transformation
for the steps introduced inside the rest of this thesis.
6
COMPONENT MODEL
Reconfiguration needs software components to be used for reconfiguration, called
reconfiguration components. In the sequel they are simply called components. The
application is given as object code making it impossible to specify these components
on source code level. This chapter introduces the component model on the control
flow graph of the application. It introduces a method which allows the user to
select components from the binary code with the knowledge of the high level API
of the binary objects used. It, however, does not require the user to be an expert
in machine programming as the proposed approach uses and abstraction of the low
level machine details of the binary code.
The chapter is structured in the following manner. The first part of this chapter in-
troduces the definition of a component by means of sub graphs of the application. It
then describes the process of deriving the high level semantics of binary application
code. The last part describes the process of identifying components by the use of
program constraints given by the user.
. defining reconfiguration components
Using the ICFG G = (N,E) of an application it is possible to do sophisticated pro-
gram analysis in order to optimize an application or even transform it into another
form. The goal of the methods described in the following sections is the identification
of suitable subgraphs of G, which may be treated as components during reconfigura-
tion, by the use of program analysis techniques. The minimal amount of information
provided by the object code of the application inherently requires the use of as little
information as possible to describe components. Typically, meta information asso-
ciated to components found in traditional reconfigurable systems are not visible to
the framework on the binary code level and are, thus, not considered here.

 component model
Definition .. formally specifies the notation of a component used in the rest of
this work by means of subgraphs and edges.
Definition .. (Component):
A component of an application given as G = (N,E) is a -tuple C = (Gc,Ei) with
Gc = (Nc,Ec) being the vertex-induced subgraph of G by the set of nodes Nc and
Ei = {(n1,n2) ∈ E | n1 ∈ N \Nc,n2 ∈ Nc}.Component
Every component, thus, defines a subgraph Gc of the applications ICFG and a set
of edges Ei representing the control flow going into the component. This set of
nodes will further be called a component, which is not to be confused with the word
component used by, e.g., UML. The set of nodes specifies the application code the
component offers and the set of edges Ei defines the entry points of the component.
Outgoing edges are needed for the binary transformation step as well, however, they
are not contained in this definition and calculated on demand.
By specifying the set Nc a component is uniquely identified. The sets Ei and Ec
can be calculated based on the set Nc. Thus, whenever a set of nodes Nc is called
a component it implicitly stands for the -tuple C = (Gc,Ei). In the rest of this
work we will see that this definition of a component is sufficient to perform the
reconfiguration task under real-time constraints.
The problem, which needs to be solved, is how these sets Nc are selected. One
possible solution would be the use of profile information, as it is used by dynamic
and static code optimizers [Ihl] to identify rarely executed parts of the applica-
tion. As an example a basic block ordering may be generated independent from the
input data processed by the application based on conservative branch prediction.
This would allow the automatic identification of rarely executed code parts for the
selection of sets for components. This would decrease the median number of recon-
figurations, because only rarely executed parts of the application would be used
for reconfiguration. However, one of the most important features of an embedded
application is not the median execution time, it is the worst case execution time,
which is not considered by the profile information using conservative branch predic-
tion. Additional execution time analysis would be needed similar to the calculation
of the blocking time in Chapter . Using these static code profiling techniques is,
however, not scope of this thesis. It is considered future work which, in addition to
the proposed approach in this thesis, allows for a more holistic approach.
. defining reconfiguration components 
Binary 
Object
(e.g, ARMv4 ELF 
Object)
Software Product
uses
API
Component
C
om
po
ne
nt
Component
Component
Control Flow
Software Product
uses
API
Figure : Allowing the user to select components: identification of components inside
binary objects by means of control flows.
The use of dynamic profile information may be beneficial. Getting dynamic profile
information based on a realistic input data set can, however, be problematic for
embedded systems. It is often not possible to generate a meaningful input data set
before deploying the final system and deriving input data sets by monitoring and
storing program traces. Using unreliable input data sets may lead to very different
profile information as shown in [HCYC]. While this may not be a huge problem
for profile based optimization, basing the identification of reconfiguration compo-
nents on these profiles may be problematic. In order to meet early time to market
constraints delaying design space decisions to the deployment phase is also counter
productive. Simulation allows for these decisions to be made earlier. However, the
heterogeneity of embedded systems makes it additionally hard to simulate the execu-
tion of an embedded system. Additional implementation overhead will be introduced
in order to generate dynamic profile data for an embedded system, which is highly
undesired if early time to market constraints are to be met.
The solution proposed by this thesis, that is used for the identification of possible
components, focuses on exploiting the domain knowledge of the developer to select
components from the binary objects. It is based on using constraints on variables, Developer selected
setssymbols and parameters of the API chosen by the system developer to select sub-
graphs of the application. The approach allows the developer to mark parts of the
application as reconfigurable and gives the developer the possibility to include and
exclude parts from reconfiguration process. As the API is the interface the devel-
oper uses for the integration of third party object code, as depicted in Figure ,
the API and the semantics of the API parameters and variables can be assumed to
 component model
be well known to the developer. Utilizing domain specific knowledge the developer
can then specify constraints on the value domains of these variables, which allow
the user to select code parts of the object file. The constraints can express value
ranges of variables which are, e.g., highly unlikely to be taken at runtime or can be
utilized to select code parts the developer wants to be reconfigurable at runtime be-
cause they are used in non time-critical parts of the application. As an example one
may consider a typical embedded device inside a control loop. Input values to the
algorithms of the embedded device cannot be calculated during static code analysis
as they depend on the environment the device is deployed in. Utilizing the domain
knowledge of the developer allows for high optimization potential using reconfigu-
ration as this thesis will demonstrate. The specification of these value ranges is up
to the developer, giving a sufficiently high grade of freedom for the developer to
configure the system.
Analyzing libraries provided by third party developers leads to the following obser-
vation. The binary objects frequently contain multiple components offering different
functionalities. They often depend on each other by means of control flow occurring
between them. Figure  depicts the general idea of identifying these components
from these objects. While the interface of the object is clearly defined by the API,
internal components are not visible. However, the control flow into those compo-
nents can be of interest for reconfiguration. The concept proposed in the rest of this
chapter offers a method to extract these internal components by means of control
flow detection utilizing domain specific user constraints.
As the object code representation of a program is commonly not understandable by
the developer, the constraints need to be given on some higher level the developer
can easily work with. More precisely, they are specified in the high level language
(e.g., C) the developer uses to access the binary objects over its API. This is done
by using high-level constraints on program variables in the programming language
of the application that are visible outside of the binary object. The constraints will
be checked statically against conditions reconstructed from ICFG. Conditions violat-
ing the constraints specified by the developer define entry points to components. In
order to enable this constraint-based component identification the semantics of the
application needs to be reconstructed. More precisely, the approach calculates invari-
ant (high-level) program expressions, independent of the path taken to a program
location, which are checked against the constraints specified by the developer.
 with its specification available in a higher programming language as, e.g., C-Header files.
. reconstructing the application semantics 
The general concept of identifying components as subgraphs of the application pro-
posed in this thesis is, thus, summarized by the following steps, with the rest of the
chapter describing them in the same order:
. Convert the assembler instructions of the program into a format suitable for
automatic processing. This will be explained inside the next section, which
introduces a specification language used for representing the semantics of as-
sembler instructions for various hardware platforms.
. Replace as many low level expressions with its equivalent high level expression,
based on the header information of the application.
. Use high level expressions of the user to allow the selection of program parts
by checking them against the high level expressions of the program, calculated
in the previous step.
. Resolve ambiguities inside the selected components in order to prepare them
for automatic optimization.
The following section will introduce the abstract syntax notation used to represent
program behavior in a way suitable for automatic processing. The examples will be
based on the ARM ISA. A brief introduction to the ARM ISA is given in Section
A. inside the Appendix.
. reconstructing the application semantics
The framework uses the Semantic Specification Language (SSL) by Cifuentes et. al.
[CS], which uses register transfer lists to model the semantics of the instructions of
an application. It allows "for the description of the semantics of a list of instructions
by means of statements or register transfers" [CS]. Low level expressions can be
modeled by an equivalent register transfer expression. The set of expressions is
then called a Register Transfer List (RTL). A basic expression is given by a single
register transfer which transfers information from one register to another. This can
be combined with arithmetic, logical, bitwise and ternary operations. For example
the following register transfer
∗32 ∗ r2 := r1 << 8
 component model
assigns  bits of the  bit left shifted register one to register two. The group of
arithmetic, binary operations and unary operations (corresponding to the equivalent
C syntax) is given as:
[OP] := {+,−, ∗, /,<<,>>, |, &, ^, ~}
Another set of operations are memory access operations. The group of load opera-
tions for the ARM architecture can be given as
[LOAD] := {LDR,LDRB,LDRH}
which contains the various load operations supported by the processor. They only
differ in the size of the operand. An example -bit word memory access would look
like
∗32 ∗ r2 := LDR r1
which loads the  bit word at the address of the value of register one into register
two. The set of store operations for the ARM architecture is given as
[STORE] := {STR,STRB,STRH}
A RTL expression containing a store operation always takes two arguments, the
memory location and the value. Both parameters may either be a register or an
absolute value. As an example the following expression may be given:
r1 STR r2
The expression semantically expresses that the contents of register one is stored at
the memory location pointed to by register two.
Condition codes are handled as typical registers. However, the size of the register
is limited to one bit. For example, the ARM architecture contains the condition
code flags N (negative condition code flag), Z (zero condition code flag) and C
(carry condition code flag) inside the Application Program Status Register. Some
operations as, e.g., comparison operations change these flags depending on the result
of the operation. The ARM comparison operation cmp r3,r4 would translate into
the following RTL:
∗1 ∗N := r3 < r4
∗1 ∗Z := r3 = r4
∗1 ∗C := r3 > r4
. reconstructing the application semantics 
Some instructions result in both: an arithmetic operation on registers and an update
of the condition flags depending on the operation performed. As an example nearly
all ARM THUMB instructions can contain the suffix "s". For example, the mov
instruction may be used as movs, which would result in an additional update of
the condition flags based on the second operand of the instruction. The complete
register transfer list for the instruction movs r0, r1 would be:
∗32 ∗ r0 := r1
∗1 ∗N := r1 < 0
∗1 ∗Z := r1 = 0
Branches are generally represented in the form of:
∗32 ∗ PC := (COND = 1) ? disp : next_pc
where COND results from the condition of the branch, disp is the displacement to
the next address of the program counter on successful condition test and next_pc
being the address of the following instruction.
The SSL offers a complete language to describe low level assembler instructions as
semantically equivalent register transfer lists. The register lists are unambiguous and
provide a perfect basis for further data flow analysis approaches. If we reconsider the
udp_new method of Figure  we can translate the instructions inside every basic
block into the corresponding RTL. The result for the first two basic blocks can be
seen in Table .
Instruction RTL Expression du/ud-chain
: mov r0, #2420 ** r0 := 2420 du(r0,)={}
: add r1, r0, #4 ** r1 := r0 +  du(r1,)={}
ud(r0,)={}
: ldr r0, [r1,#4] ** r0 := LDR ( r1 + ) du(r0,)={}
ud(r1,)={}
4: cmp r0, #0
** N := r0 < 0
** Z := r0 = 0
** C := r0 > 0
du(Z,)={}
ud(r0,)={}
5: beq 36 ** PC := Z =  ?  : PC +  ud(Z,)={}
Table : Example program part with RTL annotation
 component model
The RTL representation of the instructions allows the framework to do more sophis-
ticated analyses on the application code as all semantics of an instruction are clearly
defined in a format suitable for automatic processing. In the next step expression
substitution is performed to generate higher level expressions from the initial RTL.
As described in [CS] we may use data flow properties as du- and ud-chains to
perform forward substitution in the following way.
Definition .. (Forward Substitution):
A definition of a register r = f1({ak}, i) at instruction i in terms of a set of registers
ak and operation f1, can be forward substituted at the use of r at instruction j,
s = f2({r, ..}, j), if the definition of r at instruction i is the unique definition of r that
reaches j along all paths of the application and no register ak has been redefined
along that path. The instruction at j may then be written as
s = f2({f1({ak}, i), ..}, j)
and the instruction at i would disappear. Formally written:
s = f2({f1({ak}, i), ..}, j) iff |ud(r, j)| = 1∧ ud(r, j) = i ∧
j ∈ du(r, i)∧ ∀ak : ak − cleari→j
Cifuentes et. al. use forward substitution to perform decompilation of binary pro-
grams. The reconfiguration framework uses forward substitution on the register
expression sets out(Bi) during data flow analysis of every basic block Bi in order
to reconstruct higher level expressions for the annotation of conditional control flow
edges.
As an example of the forward substitution lets consider the program of Table . We
iteratively apply the substitution rule to every pair of instruction. As the substitu-
tion prerequisite is met for the pair of instructions (1, 2) the the instruction at 2
may be rewritten as :
2 : ∗32 ∗ r1 := 2420+ 4
Applying the result of this to the next instruction pair (2, 3) the substitution yields:
3 : r0 := LDR(2420+ 4+ 4)
Continuing with the next instruction the condition flag Z would be updated by the
substituted value:
∗1 ∗Z := LDR(2420+ 4+ 4) = 0
 This is known as the ak − clear property and denoted as ak − cleari→j.
. generating the high-level annotated control flow graph 
Reconsidering the last instruction of Table  we can then replace the expression
∗32 ∗ PC := Z = 1 ? 36 : PC+ 2
by
∗32 ∗ PC := (LDR(2420+ 4+ 4) = 0) = 1 ? 36 : PC+ 2
The expression LDR(2420+ 4+ 4) = 0 may now be used as an annotation at the
conditional control flow edge representing the control flow in case of a positive con-
dition code test. The inverse expression can be annotated at the edge representing
the false case.
Applying inter-procedural data flow analysis may further allow the substitution of re-
turn registers by the return domain of the corresponding method. Intra-procedural
analysis, however, cannot do this as the contents of the return register r0 and r1 is
unknown inside the in(Bi) set the corresponding basic block.
. generating the high-level annotated control flow graph
The higher-level expressions generated by forward substitution inside the last section
still contain registers as operands. A typical expression could look like:
∗32 ∗ r3 = ((LDR(r0+ 4))LDRB 0) >> 4
While this expression is detailed enough for most program analysis tools and/or
program reverse engineering and decompilation, it is still based on simple register
operations. While this format may be suitable for reverse engineering tools, it is not
suitable for identifying components by a system developer.
System developers, which will use RTL expressions to configure the reconfiguration
process, will find it hard to work on such (still) low-level expressions. The usual level
a developer will be operating on are, e.g., high level data structures in C or C++.
Thus, the expressions need to be further processed to derive the corresponding high
level operation on high level data types if possible. In the following the concept pro-
posed by this thesis will be introduced, which allows the user to select components
from the binary code of the application.
 The registers r0 and r1 are used for parameter passing and return registers inside ARM ABI
compliant applications.
 component model
The framework uses the header files of the binary libraries to extract all high level
data structures defined. Assuming the application is ABI conform, it is possible to
safely determine the complete memory layout of all data structures used. Informa-
tion on the size of fundamental data-types (e.g., unsigned int or unsigned char in
C) and the endianness of the architecture are of major importance as they allow
the identification of fields in structured data types. Lets consider the C-Header file
given in Listing . It shows the type definition of a simple linked list in C. It contains
some bitfields t and v, an address field and a pointer to the next entry in the list.
typedef struct list {
u8_t t : 4;
u8_t v : 4;
ip_addr addr;
list *next;
} __attribute__ ((packed));
u8_t contains(list *l, ip_addr *addr);
Listing : Example C-Header containing a type definition of
a structure containing bit fields, pointers and at-
tributes.
In order to support C expressions the framework needs to support type definitions,
global variables, structures, multi-level pointers and some major attributes, which
influence the memory layout. The extracted type information of the example header
is depicted in Table .
Field Name Type Bitsize Offset
t u_t  
v u_t  
addr ip_addr  
next *mask  
Table : Extracted type information for struct list of Listing .
 The attribute packed of Listing  is an example attribute which needs to be supported. It changes
the memory layout of a structure in the way that the least amount of memory will be occupied.
. generating the high-level annotated control flow graph 
Additionally, the signature of all global methods are extracted as, e.g., the contains
method in Listing . Using the Procedure Call Standard of the ABI, the framework
creates an alias for each corresponding parameter register. For the ARM architecture
register r0 will get the alias l while register r1 will get the alias addr. The RTL
notation does not contain a notion for types as the contents of a register is only
treated as a  bit vector, regardless of the use of the register. The reconfiguration
framework, however, associates a high level type to every variable introduced. For
example, the type of the alias l will be a pointer to a list structure. The type of
the alias addr will be a pointer to the ip_addr structure correspondingly.
.. High Level Expression Detection and Normalization
Operations on fundamental data types passed inside registers to a function may be
easily substituted by its high level alias. However, operations on aggregated data
types introduce a set of problems if the corresponding high-level expression shall
be reconstructed. Alignment restrictions, pointers, bit fields and arrays result in a
huge amount of sequences of assembler instructions for even simple load operations
on these kind of data structures. Although only a finite set of idioms is used by
compilers, theoretically an unlimited amount of idioms exists which correspond, e.g.,
to a simple load operation. As a very simple example lets consider the following RTL
expression, which loads a byte from the memory location pointed to by register r0
and shifts the value by  bits to the left and afterwards to the right:
∗32 ∗ r1 := ((LDRB r0) << 8) >> 8 ()
Semantically the expression is equivalent to ∗32 ∗ r1 := LDRB r0 as the shift opera-
tions on the byte loaded from memory nullify each other. An unlimited amount of
shift operations may be added in the same way as in Expression  by, e.g., a loop
inside the program. Hard coding these idioms is not feasible. Thus, the framework
introduces a set of fundamental semantically equivalent reduction patterns, which
will be applied to the expressions in an arbitrary order as long as there exists a sub-
stitution that may be applied. This step is called expressions normalization. The
following sections will introduce the normalization steps by categories.
 The first function parameter on ARM is passed in register r0. For a function void s(int value)
the expression r0 in the first use of r0 may, thus, be simply replaced by the alias value being a 
bit signed integer.
 component model
.. Memory Access Patterns
Words stored in memory may be loaded into registers in different ways. Sometimes
direct load word operations are used, sometimes, because of alignment restrictions,
the word is build by loading and combining each byte of the word. An unlimited
amount of possible instruction sequences exist for this operation. Also endianness
creates different patterns for accessing words in main memory.
Typically, if a program wants to load a  bit word from main memory, which
cannot be loaded in one operation due to memory alignment restrictions, the RTL
expression will look like this on Little-Endian architectures:
∗32 ∗ r3 :=((LDRB (l+ 3)) << 24) | ((LDRB (l+ 2)) << 16)
| ((LDRB (l+ 1)) << 8) | (LDRB l) ()
In order to reconstruct the corresponding high level instruction a set of patterns is
used. Table  lists the RTL normalization patterns that are introduced here for a
Little-Endian architecture. Similar patterns can be given for Big-Endian architec-
tures. Pattern M-M are given in two versions which look similar at first. Version
A corresponds to consecutive load operations with the addend increased by one.
Version B corresponds to consecutive load operations with constant addend but
increased load location. Version B is typical for programs that contain pointer arith-
metics inside the source code.
Pattern M introduces a new load operation. The standard RTL set of load opera-
tions is extended by the LDRT operation:
[LOAD] := {LDR,LDRB,LDRH}∪ {LDRT }
This operation resembles the operation of loading a word from main memory con-
sisting of three consecutive bytes. This kind of instruction can typically not be found
inside ISAs and is only used for the normalization process.
We may use the patterns to normalize expression :
. generating the high-level annotated control flow graph 
Type Pattern Replacement Condition
MA ((LDRB(τ + m)) << 8) | φ
| (LDRB(τ+n))
LDRH(τ+n) | φ m = n+ 1
MB ((LDRB((τ + 1) + n)) << 8) | φ
| (LDRB(τ+n))
LDRH(τ+n) | φ
MA ((LDRB(τ + m)) << 16) | φ
| (LDRH(τ+n))
LDRT(τ+n) | φ m = n+ 2
MB ((LDRB((τ + 2) + n)) << 16) | φ
| (LDRH(τ+n))
LDRT(τ+n) | φ
MA ((LDRB(τ + m)) << 24) | φ
| (LDRT(τ+n))
LDR(τ+n) | φ m = n+ 3
MB ((LDRB((τ + 3) + n)) << 24) | φ
| (LDRT(τ+n))
LDR(τ+n) | φ
Table : Memory Access RTL Normalization Patterns for a Little-Endian architecture. τ:
arbitrary RTL expression. φ: RTL expression containing only expressions con-
nected with the binary or operator or the empty expression. n,m: constant num-
bers.
∗32 ∗ r3 := ((LDRB (l+ 3)) << 24) | ((LDRB (l+ 2)) << 16)
| ((LDRB (l+ 1)) << 8) | (LDRB l)
M1A
= ((LDRB (l+ 3)) << 24) | ((LDRB (l+ 2)) << 16) | LDRH(l)
M2A
= ((LDRB (l+ 3)) << 24) | LDRT(l)
M3A
= LDR(l) ()
Using the normalization patterns the alignment restrictions of the hardware can be
abstracted away making further analyses easier.
.. Arithmetic and Binary Patterns
Pattern M in Table  combines two types of normalizations for shift operations.
Depending on the shift amounts n and m the result may either just be a cast to a
smaller bit-size or a cast and a shift operation. The two latter normalizations cases
in M contain a normalization of the binary expression. M is a rule for arithmetic
 component model
Type Pattern Replacement
M ∗b ∗ (τ << n) >> m
∗b−n ∗ τ if n =m
∗b−n ∗ τ << (n−m) if n > m
∗b−n ∗ τ >> (m−n) if n < m
M n [OP] m
n ∗m if [OP] = ∗
n+m if [OP] = +
· · · · · ·
n ^m if [OP] = ^
Table : Arithmetic RTL Normalization Patterns for a Little-Endian architecture. τ arbi-
trary RTL expression. n,m constant numbers.
and binary operation reductions, which are not covered by previous rules. It applies
for all arithmetic and binary operations on constant operands. The result is a new
constant which is calculated by applying the operation to the constants values.
In order to illustrate the patterns lets consider the following example normalization:
∗32 ∗ r3 := ∗16 ∗ (τ << (4 ∗ 2)) >> 8
M5
= ∗16 ∗ (τ << 8) >> 8
M4
= ∗8 ∗ τ
Using the normalization patterns the expression is reduced to a simple -bit cast of
the Expression τ. This implies that the register r3 may only take values which can
be represented by the lowest  bits at this point inside the program, thus reducing
the domain size of the register. This is especially important if the expression will be
used inside a constraint checker as the size of the input domain heavily influences
the performance.
.. High Level Variable Substitution
Lets consider the memory layout of the list structure from Listing  depicted in
Figure . The bitfields t and v share the first byte. The two  bit fields addr and
next follow directly behind.
Loading the addr field from the structure involves loading every single byte sepa-
rately on most hardware architectures as loading the complete word in one operation
. generating the high-level annotated control flow graph 
t v addr
0 1 2 3 4
next
5 6 7 8
Figure : Little Endian Memory Layout of the list structure for the ARM EABI. Memory
addresses increase from left to right.
is often not possible due to memory alignment restrictions for word operations. The
corresponding RTL expression after applying all normalization steps may look like
Expression  with l pointing to the beginning of the structure:
∗32 ∗ r3 :=LDR (l+ 1) ()
As the memory layout is known after parsing the header files, the framework looks
inside Table  for a field with the corresponding offset and size. As the offset in
Expression  is  ( byte) and the load size is  bits the expression is replaced by
its corresponding high level expression following rule M in table :
∗32 ∗ r3 := l->addr
We call this process variable substitution. The term l− > addr now defines a new
variable from a bit domain.
Sometimes matches are not as trivial to detect as in the previous example. Let’s
consider the small C code in Listing .
struct list* l;
int i = (l->addr & 0xff00) >> 8;
Listing : Example of a structure access.
A compiler without optimization would convert the code into a list of assembler
instructions that load the whole word from memory as in Expression . It would then
use a binary "and" operation for masking the corresponding bits and afterwards shift
the result by  bits to the right. The corresponding high level RTL expression with
variable substitution can easily be constructed from these operations. An optimizing
compiler, however, would just load the corresponding byte resulting in the following
RTL expression for Little-Endian architectures:
∗32 ∗ r3 := LDRB(l+ 2)
 component model
In order to reverse engineer the high level expression the framework tries to ap-
ply variable substitution by checking for a valid field that is accessed by the load
operation and adding the corresponding bit mask and shift operation to the RTL
expression. This requires looking up the bit offset and bit size of all potential fields.
The expression replacement is given in rule M in Table . The resulting expression
after applying rule M would then be
∗32 ∗ r3 := (l->addr & 0xff00) >> 8
Among the set of rules introduced here, this process is the only replacement step
that adds operations to the RTL expression, thus, increasing the complexity of it.
All other rules reduce the length of the RTL expressions.
Detecting access to bitfields inside structures results in another difficulty. In most
high level languages bitfields may consist of an arbitrary amount of bits, only limited
by the bit size of the biggest fundamental data type. A typical pattern used by most
compilers is given in rule M of Table . The first two conditions of the rule make
sure that the loaded word contains the bit field, which is accessed. The bitmask b
needs to correspond to the correct bits inside the word in the sense that the complete
bitfield is extracted. At last the shift amount r must shift the bitfield by the correct
amount of bits. If all of these conditions are met the expression term may safely
be replaced by the corresponding high level bitfield access. The following example
demonstrates the replacement rule M:
∗32 ∗ r3 := (LDRB(l) & 0xf0) >> 4
M8
= l− > t
The bitmask 0xf0 masks the uppermost four bits corresponding to the t bitfield
and the shift operation shifts the extracted bitfield by the right number of bits. The
conditions of rule M are met and the term may safely be replaced by the high level
expression.
A lot of more patterns may be defined for optimized assembler code. However, the
patterns M-M defined in the last sections already allow for a high detection rate of
higher level expressions inside the RTL expressions generated by the framework. The
evaluation chapter will give an overview of the annotation ratio for some example
applications.
. generating the high-level annotated control flow graph 
.. Global Variable Detection
In the previous sections the high level data type (e.g., pointer to a structure) of a
register was known by the parameter list of the method analyzed. Often components
inside a program communicate using the heap. Dynamically allocated objects can
be very hard to identify and out of scope of this thesis. However, global variables
can be detected very efficiently as they are statically placed inside the heap region
of the application. As the final location of such a variable is unknown prior to link
time the object files keep the symbol names inside the relocation section of the ELF
header. If the framework detects an instruction which loads a word from such a
relocation place, the load operation inside the RTL expression is replaced with the
symbol name. As every variable inside an RTL expressions is associated a type inside
the reconfiguration framework the default type of the symbol will be an unsigned
integer. If the framework detects a unique definition of the symbol inside the header
files the type is replaced with the type found inside the header.
 component model
T
yp
e
P
at
te
rn
R
ep
la
ce
m
en
t
C
on
di
ti
on
M

[L
O
A
D
](
l
+
n
)
l−
>
[f
ie
ld
]
if
{ s=
b
it
si
ze
([
fi
e
ld
])
∧
n
=
o
ff
se
t[
fi
e
ld
]
M

[L
O
A
D
](
l
+
n
)
(l
−
>
[f
ie
ld
]&
((
(2
s
)
−
1
)
<
<
m
))
>
>
m
if
{ m
=
b
it
si
ze
([
fi
e
ld
])
−
(n
−
o
ff
se
t(
[f
ie
ld
])
)
−
s
M

([
L
O
A
D
](
l
+
n
)&
b
)
>
>
r
l−
>
[f
ie
ld
]
if
{of
fs
e
t(
[f
ie
ld
])
−
n
<
s
∧
s
>
b
it
si
ze
([
fi
e
ld
])
∧
r
=
s
−
(o
ff
se
t(
[f
ie
ld
])
−
n
+
b
it
si
ze
([
fi
e
ld
])
)
∧
b
=
(2
(8
−
(o
f
f
s
e
t
([
f
ie
ld
])
−
n
))
−
1
)
−
(2
r
)
T
ab
le
:
St
ru
ct
ur
e
R
T
L
N
or
m
al
iz
at
io
n
P
at
te
rn
s
fo
r
a
Li
tt
le
-E
nd
ia
n
ar
ch
it
ec
tu
re
.
n
th
e
bi
t
re
pr
es
en
ta
ti
on
of
th
e
lo
ad
off
se
t.
[fi
el
d]
is
th
e
fie
ld
in
th
e
st
ru
ct
ur
e
po
in
te
d
to
be
l
w
hi
ch
fu
lfi
lls
th
e
co
ns
tr
ai
nt
s
of
th
e
re
pl
ac
em
en
t
pa
tt
er
n
.
s
is
th
e
bi
ts
iz
e
of
th
e
lo
ad
op
er
at
io
n
[L
O
A
D
].
. constraint-based component identification 
. constraint-based component identification
For every conditional control flow edge e ∈ E of the applications ICFG the frame-
work calculates the SSL expression c(e) = φ as described in the last section. The
expressions describe conditions in a high level notation, which must be met for the
edge to be taken at runtime, containing literals x1, ..., xn corresponding to variables
used by the application at runtime. An example of this annotation can be seen in
Figure . The edges of the annotated control flow graph contain the literal x1 =
eth_hdr− > type. The expression c((bb2,bb3)) = eth_hdr− > type 6= 0x806,
for example, specifies the condition that needs to be fulfilled for the program to take
the edge at runtime.
The SSL expressions are now used to allow the system developer to define recon-
figurable parts of the application independent from the availability of the source
code. The idea is to allow the developer to specify SSL expressions which define con-
straints on the input parameters of methods and global variables used by the API
of a third party object. These constraints can be used to specify ranges of values
for parameters and global variables. Some behavior of the object code will depend
on these value ranges (expressed by control flows occurring between basic blocks),
which may violate these value constraints. The goal is to find exactly those edges
which violate these constraints given by the developer and use these edges as en-
try points to reconfigurable components. If the developer can guarantee that these
constraints are never violated at runtime, the code could safely be removed from
the application. However, the constraints given by the developer do not necessarily
need to hold at runtime as the specified value ranges may still be passed to the API
methods. This is the reason why reconfiguration needs to take place; to ensure that
the code functionality is still maintained for violated constraints.
Additionally, it is possible to specify the names of methods to mark these methods as
reconfigurable. Listing  shows a valid constraint set, which will be used inside this
section as explanation. The ABNF of the input format is listed inside the Appendix.
 component model
 /* constraints for API method ip4_input */
 [ip4_input]
 (ip4_hdr->_ttl_proto & 0xff) != 0x6

 /* constraints for the API method ethernet_input */
 [ethernet_input]
 eth_hdr->type != 0x86dd

 /* globally valid constraints */
 [__global__]
 netif->ttl < 200
 @tls_recv
Listing : Example Constraint Set. The corresponding ABNF can be
found in the Appendix in Listing .
Listing  demonstrates the three types of constraints supported by the approach.
The SSL constraint eth_hdr− > type 6= 0x86dd in line  is a constraint on the
literal eth_hdr− > type of the method ethernet_input. All SSL expressions
containing the literal originating from the ethernet_input method will be checked
against the user constraints in the following way. Let φ be the SSL constraint of such
an edge e and ψ the input constraint of the developer, then the edge constraint will
be checked by testing the satisfiability of
c(e) = φ∧ψ
This will be done for all input constraints and for all edges containing literals spec-
ified by the developer. Doing this for the edge constraints depicted inside Figure
 with the corresponding constraint in line  of Listing  results in the following
constraints:
c((bb2,bb3)) = eth_hdr− > type 6= 0x806∧ eth_hdr− > type 6= 0x86dd
c((bb2,bb4)) = eth_hdr− > type = 0x806∧ eth_hdr− > type 6= 0x86dd
c((bb3,bb5)) = eth_hdr− > type 6= 0x86dd∧ eth_hdr− > type 6= 0x86dd
c((bb3,bb6)) = eth_hdr− > type = 0x86dd∧ eth_hdr− > type 6= 0x86dd
The last constraint c((bb3,bb6)) is obviously not satisfiable anymore. Constraints
can be as simple as in this example. However, much more complicated constraints
can be used which include multiple literals and ranges of valid values. The frame-
work supports any kind of bit-vector manipulation on the literals used. The modified
. constraint-based component identification 
0x0
0x1c
0x20
0x28
0x2c
0x34
0xa0
0xac
0x38
0x3c
0x38
0x3c
Figure : Part of the SSL annotated control flow graph of an Internet Protocol
Stack developed for smart cards. The visible part depicts the API method
ethernet_input.
constraints are afterwards forwarded to a constraint-solver. The framework offers an
abstraction layer which allows the integration of different state of the art constraint
solver. Currently the framework implementation integrates the Choco Constraint
Satisfaction Problem Solver [cho] operating on abstract types and the STP Con-
traint Solver [GD] which uses bit-blasting to solve constraints using a highly
efficient SAT-solver; "it performs array optimizations and arithmetic and boolean
simplifications on the bit-vector formula before bit-blasting to MiniSat" [ES].
The second type of constraint is the constraint in line  of Listing . It is a con-
straint in the __global__ section of the constraint file. The symbol __global__
stands for globally valid constraints. Thus, the constraint in line  is valid for all
occurrences of the netif variable in all SSL expressions of the application. Edge con-
straints which contain the netif variable will be checked in the same way as input
parameter constraints.
The third type of constraint is the symbol constraint shown in line . It directly
marks reconfiguration entry points. The symbols may be any kind of symbol occur-
ring inside the executable section of the object files.
The first two constraints will be checked against SSL expressions inside the ICFG.
Some expressions will not be satisfiable anymore as demonstrated above. We define
the set
Rc = {e ∈ E | c(e) not satisfiable}
 component model
that contains all edges e which SSL expression c(e) is not satisfiable. Let
Rs = {(v1, v2) ∈ E | v2 is a basic block that contains a symbol constraint }
be the set of edges which point to a basic block that is marked by a symbol constraint.
We then define
R = Rc ∪ Rs
as the set of reconfiguration edges. The set R now contains all edges which violate
developer constraints or edges which contain symbol constraints. By the use of
value constraints on API parameters the developer, thus, has the possibility to
influence the set R and in consequence allows for the manual selection of code parts
(sets of nodes, aka components) shall be used for reconfiguration. This leaves the
decision which code parts shall be used for reconfiguration under control of the
developer, as he can utilize the application domain knowledge in order to select
suitable components.
Definition .. (Application Entry Nodes):
Let G = (N,E) be the ICFG of an application. Then by the set Nstart ⊆ N the set
of application entry nodes is denoted.
The set of application entry nodes is partially computed by the reconfiguration
framework. By default, it contains all basic block nodes which are mapped to the
memory locations of the interrupt handling routines. This includes the board reset
interrupt handler which is the standard entry point of the cpu after re-/start of the
hardware platform. However, if the application contains, e.g., a boot-loader not all
application entry points will be detected this way because a boot-loader typically
installs program code like, e.g., interrupt handlers at boot time of the application.
Thus, the developer can provide additional entry points to the framework. However,
boot-loaders may be categorized as self-modifying applications and are, thus, not
explicitly covered here.
Given these sets it is possible to define some important sets of nodes of the ICFG,
which will be used throughout the rest of the thesis:
Definition .. (Mandatory Set):
Let the set T of nodes be the set that can be reached from the application entry
nodes Nstart without taking any reconfiguration edge as
T = {n ∈ N : ∃w = (w1, ...,wn),w1 ∈ Vstart ∧ (wi,wi+1) ∈ E \ R∧wn = n}
This set defines the set of basic blocks that we call the Mandatory Set.
. constraint-based component identification 
n1,2n1,1
n2,2n2,1
n3,2
n3,1
r1
r2
r3
Nr1
Nr2
Nr3
Figure : The Intermediate Components Nri for an example CFG based on definition
...
Definition .. (Intermediate Components):
For every reconfiguration edge ri ∈ R with ri = (ni1,ni2) ∈ E we define the set
Nri of nodes that can be reached over the reconfiguration edge without visiting a
node that is mandatory (inside set T): Nri = {k ∈ N : ∃w = (w1,w2, ...,wn)∧
w1 = ni2 ∧ (wj,wj+1) ∈ E∧wj /∈ T ∧wn = k}. We call these sets Intermediate
Components.
Both sets can be computed by using a depth first search starting at the start nodes
Nstart for finding the Mandatory Set, or at the nodes {ni2} with ri ∈ R, ri =
(ni1,ni2) for finding the intermediate components respectively using the restric-
tions inside the corresponding definition. The components of an example CFG can
be seen in Figure . Three components have been calculated using a depth first
search starting at the reconfiguration edges ri. As the name Intermediate Compo-
nent suggests, these sets are used intermediately as they may not be chosen in the
way that the sets of basic blocks are disjoint. This results in the problem of having
no distinct mapping of a basic block to a component. Using these sets of Interme-
diate Components inside the reconfiguration approach described in the following
chapters could create duplicate code segments, which is highly undesired.
 component model
. ensuring disjoint components
In order to ensure a disjoint set of components Algorithm  is used to generate
distinct components from the set of Intermediate Components. The basic idea is to
generate all possible intersections as longs as there exist sets of basic blocks that
are contained in more than one component.
In each intersection iteration (see line  of Algorithm ) all possible intersections of
the current working set S are calculated. Redundant intersections or empty intersec-
tions are not stored. At the end of the intersection step all basic blocks contained
inside any of the intersection sets Ki are removed from the components inside the
working set S. The created components Ki then define the working set for the next
intersection step. The iteration ends when the working set contains only one or no
set anymore as there exists no possible new intersection that may be computed.
Algorithm  Component Identification
: procedure generateComps(Nr1 , ...,Nrn) . Input: Nri of definition ..
: Set S← {Nr1 , ...,Nrn}
: Set K← {} . Temporary set of sets
: Set R← {} . The set of output components
: while |S| > 1 do
: for all Si,Sj ∈ S,Si 6= Sj do
: T ← Si ∩ Sj . Build Intersection
: if T /∈ K∧ T 6= {} then
: K← K∪ {T } . Add the intersection set T to K
: end if
: end for
: for all Si ∈ S do
: Si ← Si \
( ⋃
Ki∈K
Ki
)
. Remove all sets in K from Si
: R← R∪ {Si} . Add component to result
: end for
: S← K
: K← {}
: end while
return R
: end procedure
. ensuring disjoint components 
During Algorithm  the intersections of sets of basic blocks are computed in line .
These sets define new components that will be used for the reconfiguration process.
However, before the reconfiguration process is introduced some important features
of the components need to be analyzed.
Lemma :
Let Nri and Nrj be Intermediate Components and Ki,j = Nri ∩Nrj be the in-
tersection. Then there exists no edge e going from Ki,j to the set Si or Sj with
Si = Nri \Ki,j and Sj = Nrj \Ki,j .
Proof. The proof of lemma  can be seen if we reconsider the construction of Nri
and Nrj . By contradiction let e = (n1,n2) be such an edge with n2 lying in Si but
not in Sj, with ri = (ni1,ni2) and rj = (nj1,nj2) as denoted in Definition ...
This means there exists a path from ni2 to n2 over n1. However, as there must also
exist a path from nj2 to n1 as n1 ∈ Ki,j it directly follows that there also exists a
path from nj2 to n2 by taking the edge e. Thus, n2 must have been inside the set
Kij. It follows the edge e does not exist.
We can directly conclude that edges going out of the set Kij are either control flows
to the Mandatory Set or to components unequal to Nri and Nrj .
Lemma :
Let Nri and Nrj be Intermediate Components and Ki,j = Nri ∩Nrj be the intersec-
tion. Let Ki,j and Kl,k be two arbitrary intersections and K(i,j),(l,k) = Ki,j ∩ Kl,k,
then the following equations hold true:
K(i,j),(i,k) = K(i,k),(i,j) = K(i,j),(j,k) = K(j,k),(i,j) ()
K(i,j),(k,l) = K(k,l),(i,j) = K(i,k),(j,l) = K(j,l),(i,k) = K(i,l),(j,k) = K(j,k),(i,l) ()
The equations of Lemma  directly follow from the commutativity and associativity
of the intersection operator. For simplicity we denote the set in Equation  as Ki,j,k
and  as Ki,j,l,k respectively.
Using Algorithm  it is possible to split up the intermediate components into sin-
gle distinct components. For a better understanding the intersection steps of the
algorithm and the resulting components have been illustrated inside Figure . Part
 component model
K1,2
K2,3
K1,3
S1 S2 S3
S2
S1
S3
S4
S5
K1,2,3
n1,1
n2,1 n2,2
n3,1
n3,2
n1,1
n2,1 n2,2
n3,1
n3,2
n1,2
n1,2
r1
r2
r3
r1
r2
r3
S1 S2 S3
K1,2 K1,3 K2,3
S1  Nr1 \
S
Ki2K
Ki
S2  Nr2 \
S
Ki2K
Ki
S3  Nr3 \
S
Ki2K
Ki
S4 S5
K1,2.3
a)
Direct Dependecy Graph:
Direct Dependecy Graph:
b)
Figure : The intersection steps of Algorithm  illustrated on the example CFG of Figure
. The corresponding direct dependency graph is displayed on the right side.
. ensuring disjoint components 
a) depicts the set of components after the first intersection step of the algorithm.
The initial components have been modified and do not contain any duplicate nodes.
The remaining sets are used for the next step of the algorithm. After the second
intersection step the components K1,2,K1,3 and K2,3 are modified. The result is a
set of disjoint components S1, ...,S6. However, we introduce dependencies between
each of the components which are defined by the control flow edges between them.
Definition .. (Dependency):
Given two components Si,Sj, if there exists an edge e = (n1,n2) with n1 ∈ Si,n2 /∈
Si and n1 /∈ Sj,n2 ∈ Sj we say Si directly depends on Sj, denoted Si → Sj. If there
exists a path w = (w1, ...,wn) with w1 ∈ Si ∧w2, ...,wn−1 /∈ Si ∧wn ∈ Sj we say
Si can execute on Sj, denoted Si ; Sj.
The corresponding direct dependency graph of the components extracted inside the
example graph can be seen on the right hand side of Figure . The dependency
information of the components will be used later inside the component optimization
step in Chapter . The relation can execute contains the information which compo-
nents can be reached from a component taking any possible path inside the control
flow graph. In contrast to the dependency graph the graph based on this relation
can contain cycles, which (in terms of reconfiguration) can cause cyclic reconfigura-
tion. Cyclic reconfiguration is problematic for real-time applications, as it can lead
to highly increased reconfiguration times making it unusable for most situations.
However, under certain conditions cyclic reconfiguration will be acceptable as the
reconfiguration time will still be tightly bounded. Section .. will cover this prob-
lem and acceptable conditions in detail. The relation can execute is useful for the
worst case reconfiguration time analysis performed inside the chapter as it allows
the enumeration to exclude paths from the analysis.
The dependency graph may not be completely connected. However, the maximum
path length can still be computed.
Theorem ..:
Let Nr1 , ...,Nrn be the intermediate components of Algorithm , then the amount
of intersection steps of Algorithm  cannot exceed n.
Proof. Let Nr1 , ...,Nrn be the intermediate components. After the first intersection
step we will end up with a maximum of n(n−1)2 intersection sets Ki,j as Ki,j = Kj,i.
Assuming that each of the resulting sets are unequal it takes n steps to calculate all
combinations of sets (with each unique set being a new component) up to K1,2,3,..,n.
 component model
The algorithm ends at this step as this set is the only one computed in the last
phase.
Corollary  (Longest Path in the Direct Dependency Graph):
The longest path inside the direct dependency graph with n initial components
cannot exceed n− 1.
Proof. Based on Theorem .. we know that we may get at least n intersection
steps. During each intersection step we may get one additional layer of components
(the intersections). Thus the direct dependency graph may have at most n layers.
As there only exists edges from upper layers to lower layers, as stated by Lemma ,
the longest path thus may only have n- edges.
. summary
This chapter gives a definition of a reconfiguration component used by the recon-
figuration framework. The components are generated from the original binary by
means of checking constraints defined by the system developer. The constraints are
given on global variables and input parameters in a high level language (in our ex-
ample C) and checked by a constraint-solver against constraints of the application.
In order to use these constraints on the low level assembler code of the application,
a higher level program representation is derived by means of forward substituting
assembler instructions. Conditional edges inside the control flow graph are then an-
notated with the corresponding invariant constraint and used for the identification
of reconfiguration components. This allows the system developer to fine-granularly
specify program parts as reconfiguration components. Additionally, the framework
allows the user to identify components using symbols (e.g., function names).
The second part of this chapter focuses on eliminating ambiguities inside the compo-
nents. This is done by an intersection algorithm which ensures that every component
contains a distinct set of nodes. These disjoint components are subject to optimiza-
tion in Section .
 The framework contains an intermediate abstraction layer to allow the use of multiple solvers.
7
RUNTIME RECONF IGURATION
This chapter describes the reconfiguration architecture and its software concepts.
As reconfiguration introduces additional overhead to the previously static system
the overall goal of the reconfiguration approach is to keep this overhead as small as
possible.
The first part of this chapter will introduce the reconfiguration architecture. In
the second part the integration into the OS used will be discussed. A check for the
schedulability of a periodic task set executing under the reconfiguration model of the
thesis will be given. The rest of the chapter focuses on the reconfiguration protocol
and the component replacement.
. the reconfiguration architecture
In general one may distinguish between two types of reconfiguration triggers. The
first one is a user-triggered reconfiguration. This kind of reconfiguration only hap-
pens on demand of the user. It is typical for server systems in which the user wants
to exchange the software, e.g., in order to upgrade components. The second type
is the application-triggered reconfiguration. Exchange of components happens on
demand of an application. This can be done by different means. Some applications
use scripts to perform reconfigurations, other applications trigger a reconfiguration
as they depend on some functionality that is currently not available. This kind of
reconfiguration is the target of the reconfiguration framework proposed inside this
thesis.
At runtime, transparently to the user, components are exchanged on demand. Fig-
ure  describes the architecture of the reconfiguration system. The reconfiguration
manager is the central part of the reconfiguration process. It is a static compo-
nent which is automatically integrated into the application. It offers three types of
functionalities:

 runtime reconfiguration
Reconfiguration Manager
OS / Mandatory Code
1. 2.
3.
Figure : The reconfiguration architecture and the possible control flows: (.) control
flow from a component to the Mandatory Set, (.) control flow from between
components, (). control flow from the Mandatory Set to a component.
• Control flow forwarding: Control flows between components and the Manda-
tory Set are forwarded efficiently without run-time linking.
• Component loading: If a component needs to be added to the system the
reconfiguration manager loads the component from a reconfiguration server
and installs the component.
• Component replacement: The reconfiguration manager implements a replace-
ment strategy to enable the removal and addition of components at runtime.
As components may be removed from the system at any time the reconfiguration
manager must be able to intercept control flow between components in order to
avoid system failures. Basically three types of control flows need to be handled by
the reconfiguration system. Control flow going from a component to the Mandatory
Code/OS (see . in Figure ) is a special case. As the mandatory code is not moved
within the physical address space the corresponding control flow does not need any
intervention of the reconfiguration manager. The run-time overhead of these control
flows is static and very small.
Control flow occurring between components (see . in Figure ) involves the re-
configuration manager. Let’s consider the components S1 and S2 inside the figure.
component S1 triggers a control flow to component S2. As the physical memory
location of the component changes during reconfiguration the reconfiguration man-
ager is called first. The reconfiguration manager provides a special, assembler based,
routine named enter_comp (compare interface reconf_if in Figure ). The calling
. the reconfiguration protocol 
Time
Intervals
Tx
Rx
Figure : The time intervals of the reconfiguration protocol and the activity of the receive
(Rx) and transmit (Tx) lines.
convention for this routine can be found in the Appendix A.. This routine ensures
a safe control flow to a component. If the component is currently not loaded a recon-
figuration request is issued, which will use the interface to the operating system to
load the component. The issuing thread context is saved on the stack and stored by
the reconfiguration manager to resume the thread upon completion of the reconfigu-
ration. When the control flow returns to the invoking component the reconfiguration
manager intercepts the control flow again as the invoking component may have been
removed.
Control flow from the Mandatory Code to a component (see . in Figure ) involves
the same steps as the previous one. It is handled by the same assembler routine and
the call to the reconfiguration manager is also automatically added to the Mandatory
Code. A return to the Mandatory Code, however, is not intercepted.
. the reconfiguration protocol
The reconfiguration manager uses a simple reconfiguration protocol to reload com-
ponents from a reconfiguration server. An example for a reconfiguration and the
involved steps is depicted in Figure . The complete time of a reconfiguration
consists of the following time intervals:
. trm is the time the reconfiguration manager uses to find a suitable reconfigu-
ration slot and call the OS-Interface to send the reconfiguration request.
. tx is the time needed by the operating system to pass the data to the hardware
communication device. This may involve passing the reconfiguration packet
through multiple layers of a communication protocol.
 runtime reconfiguration
. tissue is the time to send the initial issuing reconfiguration request packet
over the physical communication channel to the reconfiguration server.
. tos is the time needed by the operating system of the server to pass the packet
to the server application and generate the answer data packet.
. tdata is the time needed to transfer a portion of the component over the
physical communication channel.
. tr is the time needed by the system to pass the packet to the reconfiguration
manager including interrupt latencies.
. tstore is the time needed to write the data into the flash memory (erase/write)
cycle.
. tack is the time needed to send the acknowledgment packet over the physical
communication channel.
The protocol uses a request packet for the initialization of the reconfiguration pro-
cess. The reconfiguration manager sends the ID of the component to be loaded to
the reconfiguration server in the first step. The reconfiguration server then starts to
transfer the parts of the component in portions of data. Each of these data packets
is acknowledged by the reconfiguration manager before the server sends the next
packet. This ensures that the packets are only send if the reconfiguration manager
is capable of handling the next packet in order to avoid buffer overflows (resulting
in lost packets) on the memory restricted device.
Given the worst case execution time of each of the time intervals involved in the
reconfiguration process, it is possible to determine the worst case blocking time of
a thread which is waiting for a component to be loaded. Using the protocol above,
with an header overhead of dh for every packet and an packet size sp, the worst
case time bi a thread has to wait for component Si with size si and flash page size
Pf is given by:
bi = tstatic + tfull + tresidue
tstatic = trm + tx + tissue
tfull =
⌊
si
sp
⌋
· (tos + (dh + sp) · tdata + tr + terase · sp
Pf
+ sp + tx + tack)
. reconfiguration activities 
tresidue = (tos+(dh+ sresidue) · tdata+ tr+ terase ·
⌈
sresidue
Pf
⌉
+ sresidue+ tx)
sresidue = si mod sp
The time interval tstore contains the time for erasing a flash page (which is the
most time-consuming part as we will see) and writing the actual bytes to the page,
it can be approximated by the term terase · spPf + sp. An evaluation of this will be
demonstrated on the reference implementation inside the evaluation chapter.
. reconfiguration activities
The main reconfiguration logic is encapsulated inside the enter_comp method which
allows control flow into components. Figure  illustrates these activities. The first
thing the reconfiguration manager checks is if the component, which is called, is cur-
rently loaded. The reconfiguration manager maintains a translation table T , which
translates between component ID and the physical memory address the component
is loaded to. Loaded components have a physical address > 0. If a component is
currently not loaded a reconfiguration is triggered. The reconfiguration temporarily
stops the currently executing thread by saving the context and loading the com-
ponent. However, if a concurrent reconfiguration is currently happening the thread
is blocked until the thread which is currently demanding a reconfiguration has fin-
ished its execution. This is depicted as an acceptance test inside Figure . After
completely loading the component the blocked thread is resumed and the execution
continues.
During a running reconfiguration, new reconfiguration requests are delayed until the
current reconfiguration is finished.
.. Memory Management
In order to keep the placement of components as simple as possible the reconfig-
uration manager uses a page based memory allocation policy for placing loaded
components in memory. Figure  illustrates this. The reconfiguration manager al-
locates a fixed sized area inside the memory and splits up the area into fixed sized
slots. As the approach does not assume the existence of a memory management unit
this imposes a restriction on the size of the components. How this is incorporated
into the creation of the final reconfiguration components is described inside the next
chapter.
 runtime reconfiguration
load physical 
address
addr=T(cID)
addr=0
save context
calculate 
destination
D=addr+offset
update LRU stack
continue 
execution at D
yes
no
apply 
replacement 
strategy
load component
restore context
enter_comp(cID,offset)
accept?
yes
block thread
check fur 
concurrent 
reconfigurations
no
Figure : Activities of the reconfiguration manager upon entering a component.
The memory space available for the reconfiguration manager is in general much
smaller than the sum of all component sizes in order to reduce the overall footprint
of the application. This, however, means that components need to be removed at
run-time to create space for new components, which shall be executed. This involves
finding suitable removal candidates by a replacement strategy.
.. Replacement Strategy
If a component needs to be removed, in order to make room for a new component,
the reconfiguration manager uses a replacement algorithm to find the best suitable
component to be removed. The decision which replacement algorithm to use has a
major influence on the performance of the reconfiguration. The framework uses the
Least Recently Used (LRU) replacement algorithm to replace the component which
has not been used for the longest time. In order to keep the run-time overhead as
small as possible the LRU data structure is a stack, which is, however, implemented
as a double linked list to increase the efficiency of the update operation.
. reconfiguration activities 
1
3
2
LRU Stack 
Position
1 2 3
Component 
Loaded Map
Slot Number
Component
Figure : Component placement using a LRU replacement data structures.
end
top
slot 1 slot 2 slot 3
Figure : State of the LRU data structure, implemented as a double linked list, based
on Figure .
Figure  demonstrates a possible state for n = 3 component slots. The component
Loaded Map contains the reference to the slot the component is loaded to. Figure
 illustrates the content of the LRU data structure for this state. By using the index
inside the component Loaded Map the corresponding stack element can directly by
accessed for manipulation as the LRU linked list elements are stored in consecutive
memory locations. The top of the stack always points to the component slot which
has been most recently used. The end of the stack points to the component slot
which has been least recently used, thus being the next slot to be replaced.
The most important benefit of the data structure is the constant update time for
moving components inside the stack. Whenever a component is referenced, it gets
shifted to the top of the stack. Finding the least recently used component is also
done in constant time. This characteristic makes the algorithm a perfect candidate
for the replacement algorithm as the overhead needs to be kept very small to ensure
that the costs of taking a reconfiguration edge is also kept small.
The LRU algorithm often benefits from the natural calling hierarchy of an applica-
tion. Often reconfiguration edges are call edges to another component. During the
 runtime reconfiguration
execution, the LRU stack will keep track of the called components and keep the most
recently used ones loaded. Upon return from a set of function calls the order of the
components inside the stack will naturally reflect the return order of the application.
The n most recently used ones will still be loaded. Those components will also be
the next demanded components with a high probability.
.. Indirection Layer
In addition to loading and replacing components at runtime the reconfiguration
manager is also responsible for redirecting control flows between components and
the operating system. In order to avoid runtime linking a redirection layer is inserted
which is encapsulated inside the enter_comp method. Components are statically
linked before system deployment by storing offsets inside the components binary
code. Upon loading a new component, calls are redirected by the reconfiguration
manager using the translation table T . It, thus, only needs one table lookup to
identify the components memory address and forward a call to the corresponding
offset. Example instrumentation code used by the components can be found inside
the Appendix. The linking process done statically is explained in Chapter .
. operating system integration
The reconfiguration manager is a component which is automatically generated by
the reconfiguration framework. However, the integration of the reconfiguration man-
ager needs to be done manually by the system developer. As this introduces some
overhead the integration interface is kept as small as possible in order to allow a
fast and small integration of the reconfiguration concept.
In order to work properly, the interface reconf_os_if, as depicted in Figure ,
needs to be implemented by the OS and provided to the reconfiguration manager.
Amongst others, it defines a method rc_os_send_packet, which is used to send
reconfiguration packets to the reconfiguration server. The reconfiguration manager
uses the method to pass reconfiguration packets to the OS, which is responsible
for delivering them to the reconfiguration server. It therefore has to provide some
communication channel over which the reconfiguration packets are send.
The operating system also has to provide implementations for the two methods
rc_os_curthread_pause and rc_os_thread_resume which, as the name suggests,
allow the reconfiguration manager to block the currently running thread, with its
. real time characteristics 
rc_os_send_packet(struct reconf_packet rc_packet)
rc_os_curthread_pause(int sp)
rc_os_thread_resume(int sp)
«interface»
reconf_os_if
rc_init()
rc_packet_input(struct reconf_packet rc_packet)
enter_comp <<asm, RC_ABI>>
«interface»
reconf_if
Figure : The interface required/provided by the reconfiguration manager.
context saved at address sp, and/or resume a thread after reconfiguration has fin-
ished.
. real time characteristics
The reconfiguration has some serious influence on the timing behavior of the tasks
and threads inside the system. A reconfiguration will introduce some blocking time
as the executing thread is frozen until the desired component is loaded. However,
the operating system may continue the execution of other ready threads whenever
a reconfiguration is active. If an additional reconfiguration is triggered while a re-
configuration is still ongoing, the issuing thread will be blocked as well and needs
to wait for the previously reconfiguration issuing thread to finish its execution after
reconfiguration. This introduces some timing dependencies which did not exist in
the static system without reconfiguration.
Figure  illustrates such a situation. Task  is executed first, as it is the currently
highest priority active task, and issues a reconfiguration at t1. The OS starts receiv-
ing the component using the reconfiguration protocol explained before. Receiving
the corresponding packets is handled with the same priority as the issuing task.
However, some time δi, between receiving parts of the component, may be available
to serve lower priority tasks. This is due to the reason that the operating system
will be waiting on data to be send by the reconfiguration server. This usually leaves
some time to execute other tasks in the system. Task  arrives after t1, is executed
and issues another reconfiguration at t2. The reconfiguration is delayed until task
 runtime reconfiguration
Priority
Time
Task2
Task3
Task1
Figure : Illustration of the reconfiguration blocking time and the finish time of a task.
 completely finished its execution as only one task is allowed to trigger reconfig-
urations. This is illustrated by the time interval B1 in Figure . This restriction
is fundamental for the calculation of the worst case blocking time as explained in
Chapter . Relaxing this restriction to allow any executing unit to trigger a recon-
figuration at any time would require the calculation of the worst case blocking time
to assume that a component replacement always needs to take place, leading to a
much higher blocking time.
The delay caused by the reconfiguration needs to be incorporated into the schedula-
bility analysis of the system as an additional blocking time, similar to the blocking
time caused by mutual exclusive resource accesses [But]. The task model used is
the periodic task model [LL], which is a widely used deterministic workload model.
It models repeatedly executed work loads (computations or data transmissions) as
periodic tasks which are described by the following characteristics. Every periodic
tasks consists of jobs which are described by their worst case execution time Ci
and their period Ti. A job must be completed before its deadline Di relative to its
release time. The set of tasks used inside the system is denoted as
Γ = {τ1, ..., τn}
with
τi = (Ci, Ti,Di)
and Ci being the worst case execution time and Ti being the period of the task. The
deadline is often assumed to be the same as the period, thus Di = Ti. Under the
assumption of using a fixed priority scheduling algorithm as, e.g., Rate Monotonic
(RM) or Deadline Monotonic (DM) [But] the schedulability of the system can be
checked using the response time analysis. The analysis formula needs to be adapted
. real time characteristics 
to incorporate the additional blocking time due to reconfiguration. The response
time of a task under reconfiguration is given by the equation
Ri = Ci + (Bi + κi) +
i−1∑
j=1
⌈
Ri
Tj
⌉
·Cj
 ()
with τi being sorted by priority (τ1 being the highest priority task), κi being the
worst case blocking time experienced by task i due to reconfiguration and Bi being
the time a task may be blocked by an ongoing reconfiguration. The schedulability
is guaranteed if ∀i Ri 6 Di. The value of κi heavily depends on the design of the
system. While the blocking time introduced for loading a single component can
be calculated using the equations in Section ., the overall blocking time κi also
depends on the executed path at runtime. The estimation of κi is explained in
Section .
The worst case value of Bi can be calculated in the following way. Additional block-
ing times can only be introduced by lower priority tasks which are currently issuing
a reconfiguration. Reconfigurations of higher priority tasks are already considered
by the interference term i−1∑
j=1
⌈
Ri
Tj
⌉
·Cj

of Formula . Due to the reconfiguration protocol only one task at a time can trigger
a reconfiguration. Additionally, a task’s reconfiguration request may only be blocked
once by a lower priority task. Thus, the worst case value for Bi is the maximum
blocking time κi of all lower priority tasks, that may block the reconfiguration of
the task i, plus their execution time :
Bi = max {κn +Cn, κn−1 +Cn−1, ...,κi+1 +Ci+1}
Using a dynamic priority scheduling algorithm as, e.g., Earliest Deadline First (EDF)
[But] the processor demand analysis can be utilized to test the schedulability of the
system. Adapting the processor demand formula to consider the additional blocking
time due to reconfiguration leads to the following schedulability test:
∀i ∀L (Bi + κi) +
 n∑
j=1
⌈
L+ Tj −Dj
Tj
⌉
·Cj
 6 L ()
 runtime reconfiguration
The reconfiguration has some interesting effects on lower priority tasks. As seen in
Figure , task  is able to finish before the highest priority task . This is due to
the effect that task  is still blocked by the ongoing reconfiguration and some spare
time δ3 (while waiting for data to arrive), without having an effect on the execution
time of task , can be given to the lower priority task.
A schedulability analysis based on Equation  or  is only sufficient because a task
may actually never experience blocking. Additionally, a value of δi > 0 ( as, e.g., δ3
in Figure  ) will further decrease the response time.
. summary
Inside this chapter the reconfiguration manager, the reconfiguration protocol and
the integration into the OS is explained. The first part of this chapter gives an
overview of the interfaces needed to connect the reconfiguration manager to the
OS. The second part concentrates on the protocol for loading components and the
replacement strategy used by the reconfiguration manager. The general design of
the reconfiguration manager follows the principle of minimizing the overhead which
needs to be added to the system. Therefore, a minimal set of interface functions is
used and a very resource efficient replacement strategy is implemented. The last part
concentrates on the real-time parameters of the system under reconfiguration and
gives a formula for the schedulability analysis of the system under reconfiguration.
The schedulability analysis incorporates the blocking time of a task waiting for
ongoing reconfigurations inside the system. The determination of this blocking time
will be a major part of the next chapter.
8
COMPONENT OPTIMIZAT ION
In the last chapters the reconfiguration approach and the identification of compo-
nents has been introduced. The extracted components may now be used for recon-
figuration. However, using the components in this state may be far from optimal if
factors as worst case execution time, binary overhead or memory fragmentation are
considered. This chapter will introduce an optimization algorithm which optimizes
the reconfiguration components with respect to the runtime and memory overhead
based on the environmental restrictions.
. target system restrictions / notation
The following restrictions are considered for the runtime reconfiguration approach.
The application code is assumed to be stored in flash memory. This is very typical
for small embedded systems as flash memory is a very cost effective memory solution
for non-volatile storage. The use of flash memory, however, introduces some serious
restrictions on the reconfiguration approach. The use of flash memory inherently
raises the demand of reducing the amount of flash page writes at runtime as the
lifetime of the memory page is limited by a certain amount of erase/write operations.
Thus, one objective optimization function would try to minimize the amount of such
operations. A direct restriction for a page based memory allocation algorithm as used
for the component reconfiguration is that the page size is a multiple of the flash page
size. This avoids additional erase/write cycles for flash pages which are occupied by
more than one component, thus, increasing the life time of the memory.
• The reconfiguration space consists of n pages of size Pc. The whole memory
space available for reconfiguration is thus Pm = n · Pc.
• The page size Pc is a multiple of the hardware flash page size Pf. Thus, Pc =
r · Pf.

 component optimization
Figure : Illustration of the sizes Pc,Pf, si and λi .
• A component size may not be bigger than the reconfiguration page size Pc.
Figure  illustrates the page sizes and the relation to each other. Depending on
the reconfiguration page size Pc, the total reconfiguration space Pm, the component
sizes and the application control flow itself, the worst case number of reconfigurations
and the binary overhead may change significantly. The component size si itself is
composed of the accumulated sizes of the nodes (basic blocks) of the component
(denoted w(Si)) and the size λi. Thus, si = λi +w(Si). λi = λ(Si) is the size of
the instrumentation code added to a component to implement the reconfiguration
behavior. It depends on the number and type of edges going into the Mandatory Set
or to other components and the ISA of the system. The code is not necessarily added
at the end of the component as depicted in Figure . The figure just illustrates the
two different parts of a component.
In the example the size si of component Si is smaller than Pc. This, however,
may not be the case for all components. Components with sizes bigger than the
reconfiguration page size need to be split up into multiple new components to fit
into the reconfiguration slots. This will also have influence on the value λi as new
reconfiguration edges will be introduced.
Thus, in the next step a component optimization step will calculate an optimized,
however, not necessarily optimal design configuration for the target system.
. optimization steps 
. optimization steps
The optimization of the components used inside the system is done in multiple steps.
Figure  illustrates the different steps in finding a suitable design for the system
under reconfiguration. Initially a design is specified by its two design parameters
(Pm,Pc), describing the maximum memory size used for reconfiguration and the
component slot size. For every design an iterative optimization cycle consisting of
three steps is executed. The first step is the calculation of the worst case blocking
time κ for the current unoptimized design. Under which conditions and how this
value is determined is covered in Section .. The next step is either a component
merging step for "too small" components or a partitioning step for "too big" compo-
nents. For the purpose of finding a "good" design with respect to the optimization
parameters worst case reconfiguration blocking time, reconfiguration space, recon-
figuration slot size and flash wear-out the Pareto optimal designs are calculated last.
A tie breaker function finally selects one design based on specific user parameters.
This will be described in Section ..
.. Component Partitioning
As components exceeding the reconfiguration slot size need to be partitioned in
order to fulfill the size requirement si <= Pc, the general partitioning function will
be defined next.
Definition .. (Component Partitioning Function):
A Component Partitioning Function is a function
θ : N×N→ P(N)
which partitions the component Si ∈ N into multiple components Si,1, ...,Si,n of
a maximum size in N. The function needs to consider the overhead λi,j of each
partition so that the maximum size is not exceeded.
Finding an optimal partitioning, which consists of a set of components that mini-
mizes the worst case blocking time introduced by reconfigurations, while optimizing
the memory usage and fulfilling the size constraint si <= Pc, is NP-hard. This
involves solving the k-partitioning problem with size restriction on the union of
all components, which has been shown to be NP-hard [MP]. The problem is even
more complicated by the fact that the weight of each edge of the graph may change
 The weight of an edge is the worst case reconfiguration blocking time for loading the component
the edge is leading to in the context of the k-partitioning problem.
 component optimization
Component 
Merging
Component 
Partitioning
    Calculation
Design DesignDesign
Optimization
DesignDesign
Design
Pareto Optimality Checker
Tie Breaker
ICFG Component Definition
Iterative 
Optimization
Figure : Overview of the optimization steps involved in the selection of a suitable design.
depending on which nodes are contained in each component. Finding an optimal so-
lution exceeds the scope of this thesis. Thus, the framework uses a heuristic which
is described in the following.
. optimization steps 
obj1
obj2
Figure : The partitioning function θobj illustrated. A component is partitioned into
multiple components based on the linear order of the basic blocks inside their
corresponding object file.
A simple Component Partitioning Function θobj, used for the evaluation, arranges
(or schedules) the basic blocks in linear order as they are found inside the object
files. Figure  illustrates this partitioning. The component Si containing basic
blocks, not necessarily ordered continuously inside their object file, is partitioned
into multiple components Si,1, ..,Si,4 which fulfill the size constraint si,j <= Pc.
No special optimization algorithm is used to ensure a better partitioning of the
basic blocks. This is a very simple partitioning function, which cannot guarantee
an optimal partitioning. Finding a better algorithm, which tries to minimize the
number of reconfigurations, is considered future work. This will also improve the
performance of the reconfiguration.
.. Component Merging
During Algorithm  the initially computed sets of basic blocks are split up into
distinct sets which are used as reconfiguration components inside the system. How-
ever, some components may have a very small size compared to the reconfiguration
slot size Pc. This leads to a bad memory utilization and very often to a bad recon-
figuration delay as one reconfiguration slot is blocked by a small component. This
 component optimization
may be optimized by merging small components with their direct successors inside
the dependency tree if they fulfill a certain criterion. Before stating this criterion,
the worst case blocking time function needs to be specified. The blocking time is
the time a task is blocked due to reconfiguration. An algorithm for estimating this
blocking time will be given in the next section.
Definition .. (Worst Case Blocking Time Function):
Let κ be the function
κ : Gˆ×P(N)→N
which returns the worst case reconfiguration blocking time for an application given
as its ICFG, with Gˆ being the set of all graphs, and a set of components.
How the function value of κ can be approximated will be shown inside the next
section. For now only the existence of such a function is assumed in order to allow
components to be merged.
As a local optimization rule two components are combined if the conjunction does
not yield an additional component after partitioning (which is highly counterpro-
ductive if components shall be merged) and if the worst case blocking time due to
reconfiguration does not increase:
|θ(Si ∪ Sj,Pc)| = |θ(Sj,Pc)| ()
∧
κ(G, θ(S1,Pc)∪ ...∪ θ(Si ∪ Sj,Pc)∪ ...∪ θ(Sn,Pc))
6 κ(G, θ(S1,Pc)∪ ...∪ θ(Sn,Pc)) ()
Figure  depicts some example reconfiguration time intervals which help to explain
the optimization rule. In part a) of the Figure the reconfiguration takes place on the
original component set, consisting of two small components. Two complete reconfig-
uration cycles take place. If the merged size of both components does not yield an
additional component after partitioning the merged component into smaller com-
ponents to fit into the reconfiguration page size Pc, the worst case reconfiguration
time may be decreased. This is due to the fact that the additional time intervals
trm, tx and tissue far outweigh the cost of sending an additional data packet on
an already established reconfiguration connection as depicted in Figure  c). This
is even more clear if the additional data fits into the previous data packet and/or
. calculating the worst case blocking time κ 
b)
c)
a)
Figure : Example reconfiguration time intervals for a) not merged components, b)
merged components with size fitting into the last chunk of data, c) merged
components with an additional data transfer cycle.
flash page as seen in part b) of the figure. In this case the reconfiguration manager
may even decrease the amount of flash page erase/write cycles, which is one of the
most costly part of the reconfiguration. As this might not always be the case rule
() needs to be fulfilled as well to ensure a reduction of the worst case blocking time.
Another side effect of merging two components based on the above rule is the re-
duction of the number of reconfigurations for the worst case execution path. This
results in a longer lifetime of the flash memory and thus the complete system. Com-
ponents are merged using Algorithm . During each iteration step the current worst
case reconfiguration blocking time needs to be calculated by the function κ. How
this can be done is explained inside the next section.
Algorithm  Component Merging
: procedure mergeComps(S1, ..,Sn)
: R = {S1, ..,Sn}
: while ∃Si,Sj ∈ R with Si → Sj : which fulfills () and () do
: R = R \ {Si}
: R = R \ {Sj}
: R = R∪ (Si ∪ Sj)
: end while
: return R
: end procedure
. calculating the worst case blocking time κ
An important step inside the optimization cycle and the following design space explo-
ration is the calculation of the accumulated blocking time, due to reconfigurations,
the application may encounter in the worst case. The process of calculating this time
 component optimization
is closely related to the problem of determining the worst case execution time of an
application in general. Theiling [The] used implicit path enumeration [LM] to
formulate the worst case context sensitive paths as a set of boolean SAT-formulae.
This allows a SAT-solver to efficiently calculate the worst case path. Loops are in
general handled by loop bounds which need to be known, either by program anal-
ysis or by user input. However, this approach is only possible if the edge weights
are constant. Unfortunately, the edge weights of the reconfigurable application are
dynamic as they depend on the path taken through the application. Some reconfig-
uration edges impose a high overhead if the component is not loaded, other edges
do have no blocking time overhead as the component is currently loaded. Thus, the
calculation of the edge weights already implies an explicit generation of all paths
through the program, which renders state of the art approaches for worst case exe-
cution time measurements (like the implicit path enumeration [The]) unusable.
.. Efficient WCET Calculation by Path Enumeration
The framework performs an explicit path enumeration on the context sensitive in-
terprocedural control flow graph using a modified depth first search. As loops inside
the control flow graph may introduce an infinite amount of paths the algorithm
uses a specific condition to ensure the termination of the path enumeration. This is
explained at the end of this section. Further considerations on this topic are given
in Chapter A..
During the path traversal an additional call string, called the reconfiguration context,
is stored:
Definition .. (Reconfiguration Context):
Let C1 = (G1,E1), ...,Cn = (Gn,En) be components. A Reconfiguration Context is
a call string ξ ∈ E∗r, with
E∗r =
n⋃
1
Ei
The reconfiguration context, thus, only contains reconfiguration edges. The context
is created on the fly, during the depth first search on the context sensitive inter-
procedural control flow graph whenever a reconfiguration edge is taken. The context
sensitive inter-procedural control flow graph is constructed as described in Definition
.. on the set of call and return edges. However, the context connector used for
the construction is defined as follows.
. calculating the worst case blocking time κ 
Figure : Illustration of the path traversal on the context sensitive graph. During the
traversal the node ni is reached with different reconfiguration contexts.
Definition ..:
The connection function ⊕ is a function
⊕ : E∗ × E→ E∗
with E being the set of call/return edges which connects two call strings in the
following way:
⊕ e1 = (e1)
(e1, e2, .., en)⊕ en+1 =

(e1, e2, .., en−1)
if en+1 is the corresponding
return edge of the call edge en
(e1, e2, .., en, en+1) otherwise
The context connector of Definition .. allows nodes with the same calling context
to be modeled as the same node, abstracting away finished function calls. This is
feasible since the context information of finished function calls is not of interest
for the determination of the worst case reconfiguration delay. The calling context
ensures, however, that the path enumeration continues at the correct calling node
for every return edge taken, making the path enumeration much more precise in
contrast to a context in-sensitive enumeration.
Definition .. (Reconfiguration Context Delay Function):
Let κc be the function
κc : E
∗ →N
which returns the reconfiguration blocking time for a reconfiguration context ξ ∈ E∗.
 component optimization
The function κc takes the reconfiguration context and simulates the reconfigura-
tion replacement function upon this context starting with the worst case scenario
of no component loaded at the beginning. It sums up the blocking time of every
component that needs to be loaded. Assuming the worst case time intervals (as
described in Section .) for loading the components is known the blocking time
introduced for loading a component Si is known. In consequence the blocking time
of a reconfiguration context is known as well.
Figure  depicts a possible situation during the path enumeration of the worst case
blocking time analysis. During the path enumeration the path (n2, )→ (ni,ω)may
be analyzed first. The reconfiguration context upon reaching node (ni,ω) is ξ1. The
complete sub tree following node (ni,ω) is analyzed by the framework and a worst
case reconfiguration delay is calculated. At some point the depth first search will take
the second path (n3, )→ (ni,ω) reaching the node with a reconfiguration context
ξ2. It’s reasonable to guess that the search can be stopped here if the blocking time
caused by ξ2 is smaller than the blocking time caused by ξ1. However, this is not a
valid condition as the counterexample in Figure  illustrates. For convenience, the
components entered have been shown inside the reconfiguration call string instead
of the edges. Although the path in b) has a higher number of reconfigurations
at node n1, the path in a) results in a higher number of reconfigurations at a
later point of the path; in this case node n3. This is due to the reason that the
components c5 and c6 are still loaded on the path in b) while the path in a) needs
to load the components. Thus, the above condition is not sufficient, however, useful
in combination with some other conditions.
In contrast to a depth first search the explicit path enumeration may not stop
whenever a node is visited again. This makes the explicit enumeration very expensive
(assuming termination is guaranteed). However, the search may stop traversing a
path at node (ni,ω) if one of the following conditions is true, which may speed up
the algorithm runtime.
Theorem .. (Reconfiguration Bound Condition):
Let (n1,ω) be the current node which is visited during a depth-first search on
the context sensitive graph Gc. Let ξ1 = (c1,1, c1,2, ..., c1,k1) be its reconfiguration
context upon reaching this node. Let rcmax be the current maximum blocking time
of the paths visited by the depth-first search. Let n be the amount of reconfiguration
slots available. If the node (n1,ω) has been visited prior with a reconfiguration
context ξ2 = (c2,1, c2,2, ..., c2,k2) with one of the following conditions
. calculating the worst case blocking time κ 
Figure : Example of two reconfiguration contexts during a path traversal with n = 3
number of component slots. The dotted square around a set of nodes defines a
component. The underlined elements inside a context trigger a reconfiguration
as the component entered is currently not loaded.
a. ξ1 = ξ2
b. κc(ξ1) 6 κc(ξ2)∧ S1 = S2 with
Si being the set of the last distinct n components loaded by ξi
then the current path iteration can stop, as rcmax cannot increase on the current
path.
Proof. If the path enumeration can stop at node (n1,ω), there must not exist a path
with reconfiguration context ξ3 starting at node (n1,ω) so that the concatenation
of both reconfiguration contexts yields a higher blocking time, or formally rcmax >
κc(ξ1 ⊕ ξ3).
a. Figure  illustrates the situation of the condition. Path p1 is traversed first,
path p2 afterwards. Upon traversal of the path p2, the edge ((n3,ω), (n1,ω)),
for the condition to take place, is a backwards edge. The subtree of node
(n1,ω) has, thus, been fully iterated by the path enumeration. As ξ1 = ξ2
the repeated traversal of the subtree would yield the same reconfiguration
contexts, thus rcmax cannot increase on the path p2.
 component optimization
backwards edge
Figure : Condition A. of the bound condition for the worst case reconfiguration delay.
b. The condition essentially states that the current reconfiguration slots upon
reaching node (n1,ω) are filled with the same component as it was the case
during a previous path enumeration, with reconfiguration context ξ2, reaching
this node. As the current state of the component slots is the same all following
paths starting at node (n1,ω) will lead to the same additional blocking time
caused by the reconfiguration context ξ3. As κc(ξ1) 6 κc(ξ2) it directly fol-
lows that the current path cannot yield a higher worst reconfiguration blocking
time wcmax as wcmax > κc(ξ2 ⊕ ξ3) > κc(ξ1 ⊕ ξ3).
.. Handling Cyclic Reconfigurations
Theorem .. gives two conditions which allow the detection of paths that cannot
yield an increased worst case blocking time caused by reconfigurations. Thus, the
path enumeration can stop under these conditions, making the calculation of the
worst case blocking time faster, however, not necessarily finite. To ensure the termi-
nation of the algorithm the cyclic dependencies of components (detectable by loops
inside the dependency graph) need to be considered.
The initial dependency graph of the components generated by Algorithm  did not
contain any loops as shown by Lemma . However, the partitioning step can generate
. calculating the worst case blocking time κ 
loop  1
unrolled once
loop1
Figure : Loop unrolling for the calculation of the worst case reconfiguration blocking
time. Loop is unrolled once to simulate one iteration of it. All following iter-
ations are dropped resulting in a finite number of paths.
components which do not fulfill this property any more as loops may have been
introduced between components again. Thus, it is not guaranteed that the algorithm
terminates. Loops inside the control flow graph may cause an infinite loop of the
path enumeration algorithm as an infinite number of paths may be generated. In
order to guarantee the termination of the algorithm every loop containing multiple
components is unrolled once. Figure  illustrates this. The loop unrolling simulates
one iteration of the loop, thereby introducing copies of the nodes (n1,ω) and (n2,ω).
Due to the reconfiguration protocol used, all reconfigurations during the iteration
of the loop can be modeled this way if the number of components used inside the
loop does not exceed the number of reconfiguration slots n. However, if the number
of components exceeds n the algorithm terminates with an unknown worst case
reconfiguration delay. As the number of iterations of the loop is unknown the number
of reconfigurations may increase indefinitely as well. Additional considerations to
this topic can be found inside Section A..
Generally, the precision of the analysis may be increased by using loop bounds,
calculated by a data flow analysis or given as a user input. This, however, is not
scope of this thesis and may be added to the algorithm in future work. In most cases
 component optimization
a configuration containing cyclic reconfiguration paths of length bigger than n are
undesirable as the worst case execution time will increase dramatically, disqualifying
the configuration from the set of pareto optimal configurations.
.. Speedup of the Algorithm
Although Theorem .. allows the framework to discard candidate paths, which
cannot lead to the maximum worst case reconfiguration delay, the enumeration of
the remaining paths may easily become infeasible for larger applications. As an
example the duration of the worst case reconfiguration blocking time analysis is
listed inside Table  for an example application with  nodes. The enumeration
has been done on the ICFG using the bound conditions of Theorem ... The
analysis took around  seconds on a standard Linux machine (bit) running on
an Intel Pentium  . Ghz processor, even with the bound conditions of Theorem
... Without applying Theorem .. the enumeration did not finish before a heap
overflow occurred inside the Java Virtual Machine, running with GB of heap space.
As the calculation needs to be done several times during the component optimization
phase, the performance of the worst case reconfiguration blocking time calculation
algorithm is a not unimportant feature.
In order to further speedup the process of the path enumeration, the ICFG may
be reduced to a subset of the original graph by removing the nodes which are not
part of a path that may increase the reconfiguration blocking time. More precisely,
all nodes for which there exists no path to a reconfiguration component may be
removed from the ICFG. This can be done by doing a depth first search on the
predecessor graph from all component nodes. This will exclude paths from the path
enumeration that will never trigger a reconfiguration. Figure  depicts an example
graph. The reduced subgraph has been marked.
Nodes Time
ICFG  , s
ICFGreduced  , s
Table : Path enumeration time for the evaluation application with and without speedup.
The complete application consist of  nodes. Five components with a total
number of  nodes have been used for the calculation.
Table  demonstrates the size of the reduced ICFG for the example application
used inside the evaluation chapter. The reduced graph only consists of  nodes
. calculating the worst case blocking time κ 
Figure : Reduction of the ICFG. Nodes, lying on a path to one of the components S1
or S2, are marked and may be used for the path enumeration.
making the enumeration much more efficient. The enumeration only took , s in
contrast to the  seconds taken on the original graph.
The reduction of the graph allows for a reduction of the algorithm execution time.
However, in the worst case every single node inside the graph may be part of a
path to a node inside a component. In these, admittedly rare, cases no reduction in
execution time will take place.
.. Quality of the Estimation
The estimation of the worst case reconfiguration delay as proposed in the previous
section is an overestimation of the actual worst case reconfiguration delay. The
assumption of the path traversal algorithm is that every single path can be executed
at runtime. However, not all paths can be executed at runtime as the conditions for
conditional edges may exclude each other. Such a situation is depicted inside Figure
. The highlighted path describes a path that will be enumerated during the path
traversal, however, the conditions r0 > 4 and r0 <= 4 exclude each other, assuming
r0 is not redefined in (n2,ω), (n3,ω) and (n4,ω).
For object code, which has been compiled without optimization, this kind of control
flow is very common. Code-Optimizer, optimizing for speed, would, however, avoid
these code structures by, e.g., duplicating the code of node (n1,ω). Thus, the worst
case blocking time calculation for optimized code will be closer to the actual blocking
time as it would be for unoptimized code.
Uncertainties introduced by indirect jumps, overestimated by Hell Nodes, can fur-
ther reduce the quality of the estimation. Even a single indirect function call may
reduce the accuracy of the worst case analysis heavily. This is due to the fact that
 component optimization
Figure : Demonstration of a path inside the CFG which can not be taken at runtime.
the overestimation of the indirect jump targets may lead to many paths that may
actually never been taken by the application. As long as no reconfiguration compo-
nent is reachable by an uncertain indirect jump the blocking time estimation will
be very precise. If this is not the case the worst case blocking time can differ from
the real blocking time and can, thus, only be used as a hint for the design space ex-
ploration. As there currently exists no precise algorithm for detecting the targets of
indirect jumps under all conditions, the framework depends on the user to resolve
uncertain indirect jumps, which occur on reconfiguration paths.
. design space exploration
The amount of parameters to the system creates a huge number of possible con-
figurations. The reconfiguration page size Pc and the total amount of memory for
reconfiguration Pm have a huge influence on the system parameters. The parameters
which are used for the characterization of a configuration are:
• Worst Case Reconfiguration Blocking Time dr:
 As long as the source code is not available.
. design space exploration 
The worst case reconfiguration overhead is the maximum time delay the appli-
cation may receive if executing any possible path inside the context sensitive
control flow graph of the application caused by reconfiguration. dr is calcu-
lated by the function κ for every design point.
• Flash Wearout dw:
The flash wearout is the maximum number of erase/write cycles which occur
during the execution of the path with the longest worst case reconfiguration
blocking time. Depending on the amount of reconfiguration slots and the exe-
cution path the flash pages suffer from differing wearouts.
• Reconfiguration Space Pm:
The total memory available for loading components onto the device.
In general, each of the parameters shall be minimized. However, the flash wearout
and the worst case reconfiguration blocking time are heavily influenced by the
amount of reconfiguration space available and the slot size Pc. Typically, the values
dw and dr evolve inversely proportional to the reconfiguration space Pm. However,
local minima may exist depending on the application. Finding these minima is the
goal of the following design space exploration step. The framework is capable of us-
ing additional parameters as the binary overhead introduced by the reconfiguration
or the memory fragmentation. However, for the design space exploration the first
three parameters are used.
Using the three characteristics, a design point inside the design space of all possible
configurations is defined as:
Definition .. (Design Points):
A design point is a triple d(Pm,Pc) = (dr,dw,Pm).
For each design point the parameters dr and dw are calculated. The blocking time
dr is calculated as described inside the last section. Every design point describes a
specific configuration of the reconfiguration system under the restrictions Pm and
Pc. Finding an optimal design point is not always possible under the restriction
that all three parameters dr,dw and Pm shall be minimized. This is a classical Pareto Optimization
vector optimization problem and is solved by using the Graef-Younes method with
backward iteration [Jah] on the set of design point. It calculates the Pareto optimal
 component optimization
points D for which there exists no other design point d ′ that strictly dominates d.
For multi-dimensional vectors x = (x1, x2, .., xn) ∈ R and y = (y1,y2, ..,yn) ∈ R
the vector x strictly dominates y (written x ≺ y) if ∀xi 6 yi and ∃xi < yi. Thus,
the Pareto optimal design points are points for which no other design point exists
that offers "better" parameters.
The set of Pareto optimal design points may, however, contain more than one ele-
ment. Each of the element is minimal and thus objectively equally "good" in com-
parison to the other design points under the criteria of Pareto optimality. However,
one design point needs to be chosen to create the reconfiguration system. In order
to find a configuration the framework uses the following formula to rate each Pareto
optimal point by a set of user defined rating parameters.
Each parameter is valued by the function fr which takes a maximum value vmax
for the input parameter v.
fr(vmax, v) = 1[0,vmax](v) · (1−
v
vmax
)
The function fr is depicted inside Figure . This basic rating function lets the
system developer specify a maximum bound for every design point parameter. The
value of each parameter linearly decreases the closer it gets to its maximum value.
The value of a design point is the weighted sum of the value of each parameter and
the provided memory space Pm for the reconfiguration:
fv(dr,dw,Pm) = λdr · fr(drmax ,dr) + λdw · fr(dwmax ,dw)+
λPm · fr(Pmmax ,Pm)
Using the coefficients λdr , λdw and λPm the system designer can specify the weight-
ings for the values of the parameters. For some systems the flash wearout will be less
important than the reconfiguration delay and memory space required. This could
be modeled by the relation λdr ≈ λPm << λdw , which will favor design points with
smaller reconfiguration delay and memory space requirements.
The value of fv lies inside the interval [0, λdr + λdw + λPm [. The minimum is reached
if all parameters exceed their maximum value. The supremum λdr + λdw + λPm ,
. summary 
0 0.5 1 1.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure : The design point parameter rating function fr. The function value fr linearly
decreases from 1 to 0. vmax acts as a delimiter of the value v.
however, can not be reached, as all parameters would have to be 0 for this to
happen.
Under the set of Pareto optimal points the function fv is used as a tie-breaker by
taking the design point which maximizes fv. By using the Pareto optimal points, a
set of objectively good points is chosen. By the use of the tie-breaker the designer
may then, subjectively, configure the system depending on the requirements of the
system.
. summary
This chapter describes the process of optimizing the extracted components with
respect to the target system parameters. In the first step components are merged
whenever the reconfiguration overhead does not increase. This ensures that small
components do not block reconfiguration slots and thus prolong the reconfiguration
delay. This step involves calculating the worst case reconfiguration delay, which is
handled by doing a path enumeration on the context sensitive control flow graph
 component optimization
of the application. As the enumeration of all paths inside an application increases
exponentially with the size of the application, a set of conditions is given which
allow the path enumeration to be feasible in time for most applications. The last
part of the chapter concentrates on the design space exploration in order to find
an optimal solution for the configuration of the reconfiguration system. All possible
combinations of total amount of memory and maximum component size are evalu-
ated. After Pareto Optimization the remaining design points are subject to a rating
function which is used as a tie-breaker to determine the final design of the system.
9
BINARY TRANSFORMATION
Inside this chapter the binary transformation of the original system (given as a set
of ELF Object files) into the reconfigurable system is described.
. modification flow
The overall process of the binary transformation is depicted in Figure . After the
binary analysis and the identification of components the first step is the extraction
of the components from the object files. In the same step the remaining code inside
the object files needs to be modified, which is called Binary Transformation. All
references need to be updated to the new addresses of the basic blocks inside the
object files. This includes branches, loads from the .text section of the object file
and the relocation symbol offsets inside the symbol table. Additional symbols are
created for the reconfiguration edges.
This modification of the object files needs to be safe in the sense that all instructions
and data words need to be known explicitly. Uncertainties will make the static
process of the modification impossible. Object files marked as unsafe will, thus, not
be modified.
The approach needs to ensure that the structural integrity of the application is
maintained during reconfiguration. It is mandatory that all references are redirected
correctly. As the approach proposed in this thesis tries to reduce the runtime over-
head of the reconfiguration, all the modifications are done off-line by modifications
on the object files.
After the object files have been modified they are reintroduced into the standard
linking flow. The linker will produce the final binary with reconfiguration support.
The final executable binary is then read by the reconfiguration framework to extract

 binary transformation
ELF Object
ELF Object
ELF Object
Linker
Executable 
ELF Object
Modified ELF 
ObjectModified ELF 
Object
Binary 
Transformation Modified ELF 
Object
Component
Linking
Component
Component
Component
Figure : Overview of the binary transformation step. The object files are transformed
and inserted back into the linking step. The final binary is used to update the
references inside the reconfiguration Components.
the absolute addresses of the reconfiguration symbols added to the object files. These
symbols are then used to link the binary code of the components.
. elf file modification
Given the final components, every block n ∈ Si has to be removed from its corre-
sponding object file. Therefore, the basic blocks of the object file are parsed in a
linear manner and removed if it is to be contained inside a reconfiguration compo-
nent. Thus, all following basic blocks move to lower addresses. This process continues
until all basic blocks are parsed. The framework uses the libelf library to modify
the executable .text area, the symbol and the relocation tables of the object files to
reflect the changes made on the control flow graph. The major part of the rewriting
process involves modifying all instructions that reference other basic blocks inside
the object files. For the ARMv(T) ISA this involves changing the following set of
eleven different instructions specified in Table  (see [ARMa] for details on the
instruction types).
For each of the instructions the corresponding offset to the basic block referenced
needs to be recalculated and changed. However, this only needs to be done for object
files that need to be modified.
. elf file modification 
Instruction Encoding Type
Branch B T-B, T-B, A-B
Branch and Link BL T-BL, A-BL
Load Register LDR T-LDR, A-LDR
Load Byte LDRB A-LDRB
Load Halfword LDRH A-LDRH
Load Signed Byte LDRSB A-LDRSB
Load Signed Halfword LDRSH A-LDRSH
Table : Instructions that need to be modified inside the binary rewriting process for the
ARMv(T) ISA.
Additionally, the symbol and relocation tables need to be changed. Symbols and
relocatable instructions may now be defined at different positions inside the exe-
cutable area of the object file. Thus, the table entries are updated with the new
positions. Some symbols and relocation entries may even be removed since the basic
blocks, which referenced these symbols, do not exist any more, thus removing the
dependency between these object files containing these basic blocks. The result of
such a rewriting process on a relocation table can be seen in Table . During the
process, entries five, six and eight have been removed from the relocation table as
the basic blocks containing these relocatable instructions have been deleted. For all
other entries the offset has been updated. The basic block removal also results in
symbols to be changed or removed inside the symbol table of the binary. This also
means that linking this object no longer depends on the removed symbols, which
had to be provided in some other object files.
.. Instrumentation Code
In order to implement the control flow for reconfiguration edges between components
or the components and the Mandatory Set instrumentation code blocks need to be
inserted. The modifications that need to be done can be characterized by four types
of control flows, which need to be handled in a distinct manner.
The first type of modification is depicted in Figure  and describes how a control
flow between two successive basic blocks n1 and n2 is handled. It makes no difference
whether n1 is inside a component or inside the mandatory set. Instrumentation code
is added to the object file of n1 directly following node n1. This instrumentation
 binary transformation
Nr Offset Type Sym. Name
  R_ARM_THM_CALL htons
 c R_ARM_THM_CALL ethar_ip_input
  R_ARM_THM_CALL pbuf_header
 a R_ARM_THM_CALL ip_input
  R_ARM_THM_CALL ethar_ip_input
 e R_ARM_THM_CALL pbuf_header
 e R_ARM_THM_CALL libprintf
  R_ARM_THM_CALL ip_input
Nr Offset Type Sym. Name
  R_ARM_THM_CALL htons
 c R_ARM_THM_CALL ethar_ip_input
  R_ARM_THM_CALL pbuf_header
 a R_ARM_THM_CALL ip_input
  R_ARM_THM_CALL libprintf
Table : An example relocation table before and after the binary rewriting process.
code adds a call to the reconfiguration handler, which in turn will transfer the
control flow to node n2. The context of the original control flow is not altered.
The second type of modification handles the control flow from a component into the
Mandatory Set as depicted in Figure . As node n1 is moved into a component
the control flow to the former successive basic block n2 needs to be handled by
an additional instrumentation code block inside the component. While the original
control flow did not contain any branch, the handler adds a branch to node n2 inside
the mandatory set. As the physical location needs to be known for this branch, an
additional relocation symbol is added to the relocation table of the corresponding
object file. This allows the reconfiguration framework to read the physical location
of the basic block from the symbol table of the linked binary as long as the linker
does not remove this information. This can, however, be assumed as all linkers allow
such information to be retained. The user has the chance to remove such information
from the binary after the final component linking step.
. elf file modification 
handler
reconf
Figure : Addition of instrumentation code for a reconfiguration edge between two suc-
cessive basic blocks.
__retsym_#:
handler
Figure : Addition of instrumentation code for control flow into the Mandatory Set. An
additional symbol needs to be created to identify the absolute address of node
n2 inside the final binary.
The third type of modification, as depicted inside Figure , differs from the other
types of modification in the way that the reconfiguration edge between node n1
and n2 is an actual branch. The code and the instructions inside the object files
are already scheduled and the framework tries to maintain this schedule. However,
the instrumentation code which triggers the call the reconfiguration manager needs
to be inserted in branch distance of node n1. The framework, thus, tries to find
a suitable location inside the object file which does not destroy the linear control
flow of other basic blocks. Suitable locations are, e.g., the end of a method or right
before/after a block of data words.
The last type of control flow that needs to be treated in a distinct way occurs by
return statements which may jump back into a component. If the function return
is not intercepted and the component, the return statement tries to jump back to,
has been replaced, the result would lead to a system failure. This kind of control
flow cannot simply be handled by a static branch that is inserted at the point
of function return. The control flow of function return statements depends on the
calling context and may have multiple return locations. Thus, the return location
and the corresponding component to return to is only known at runtime.
 binary transformation
handler
reconf
un- / conditional / 
call edge
Figure : Addition of instrumentation code for branches to components. The handler
basic block needs to be inserted inside the object file at a suitable location
inside the branch distance of node n1.
In order to handle this situation, the corresponding call edges are treated specially.
The reconfiguration manager stores the return address of a function call on a sepa-
rate stack and sets the return address to a handler function inside the reconfigura-
tion manager. This way the reconfiguration handler intercepts all (and only) control
flows of return edges into reconfigurable components. The corresponding change of
control flow is depicted inside Figure . Upon function return the original return
address is restored, the component loaded if necessary and the control flow returns
to node n2.
Many reconfiguration edges will have the same target nodes. Instead of inserting
instrumentation code for every reconfiguration edge the framework tries to reuse
existing handlers if the target node is the same. However, if the relative distance
to the handler code is too big to be implemented as a single jump instruction, the
instrumentation code is duplicated.
.. Data Duplication
Very often data words are stored in between the executable code of a component.
Optimizing compiler place these data words in a way which allows the data to be
accessed from a maximum number of locations referencing it. However, by splitting
up the code into multiple components the executable code may be not able to access
 For the ARM Thumb ISA jump instructions provide a maximum of bits for the jump offset,
resulting in a jump offset in the range of [−512, 512] bytes
. elf file modification 
return reconf
manager
return
Figure : Interception of return statements into components. The return address is mod-
ified prior to the corresponding function call to allow the reconfiguration man-
ager to intercept the return.
the data word contained inside another component any more. This dependency of
a basic block loading a data word from another basic block is called a reference.
References between components generally need to be avoided as they would require
a reconfiguration in order to allow the component to fetch a data word from the
new component, although, it might not even be executed.
The solution to this problem is the duplication of the data words to all the compo-
nents which reference it. This is possible without restriction as the data words are
read only. Thus, no dynamic consistency problems arise at runtime. The duplicated
data words and relocation entries, however, introduce additional overheads.
.. Additional Linker Symbols
Object file references are local references in the context of the sections of the object
file. These sections are merged inside the final binary and only the merged section is
visible after linking. The reconfiguration component linking process, however, needs
to replace relocation entries with their corresponding absolute memory address,
which are based on an offset inside the local object file section context. In order to
calculate the final offset into the sections of every component a symbol referencing
the local section offset is added to the symbol table of every object file. Figure 
illustrates this process. For all object files symbols for all referenced sections are
added to the symbol table of the object file. After linking the modified object files,
 binary transformation
.bss
.bss
.bss
obj1.bss
obj2.bss
obj2.bss
obj1.bss
obj1
obj2
Figure : Insertion of section symbols to ensure the correct linking process of relocation
entries inside the reconfiguration components.
the references can be calculated by taking the corresponding symbol values of the
referenced section. This way it is possible to identify the physical memory locations
for relocation entries inside the components. The step is absolutely mandatory, as
it ensures the correctness of memory accesses to these sections as, e.g., the heap.
References into string tables of the object files introduce additional problems. Link-
ers try to optimize string tables in the process of string table consolidation. Usually
duplicate strings and substrings are identified by the linker. Duplicate strings are
removed. Substrings are identified and referenced as appropriate. This, however,
changes or completely removes the offsets into the local string table of an object,
making the identification based on the section offsets hard. In order to find the
absolute address of those strings, for every string referenced by a component a cor-
responding symbol table entry is created. The linker is then forced to store the
corresponding absolute address for the referenced strings inside the symbol table
after optimization.
. summary
This chapter describes the modifications made to the original set of object files. It
lists all modifications that need to be done in order to transform the application
into a reconfigurable application. The first part concentrates on the modifications
. summary 
made to enable the reconfiguration inside the Mandatory Code. The second part
lists the possible control flows and how they are handled between components and
the Mandatory Code. The last part focuses on the practical problems which occur
during the link process of the components. All modifications are done at link time
allowing the components to be loaded and used at runtime without linking them
again. All instrumentation code added references other components by using the
reconfiguration manager, which will transfer the control flow between components
according to their current location.

Part III
EVALUATION

10
EVALUATION
This chapter covers the evaluations made for the binary reconfiguration approach
proposed inside this thesis. The first part will introduce the main evaluation scenario
used for measuring the performance and the overheads of the approach.
The second part will give an overview of the annotation ratio of the high level
annotated control flow graph as it is a basic reference of how good parts of the
application may be identified by using the constraint system. The chapter will then
discuss different performance characteristics of the approach and will discuss the
result of the design space exploration for the reconfiguration scenario used in this
chapter. The following sections . to . are based on the publication [BGOa].
. case study - smartcard ipstack
In cooperation with an industrial partner the reconfiguration approach has been
evaluated using an Internet-Protocol stack library for a smart-card using an ARMv
processor. The different protocols supported by the Internet-Protocol stack library,
consisting of implementations for the Internet Protocol Version  [ipvb] (IPv), Ver-
sion  [ipva] (IPv), the Transmission Control Protocol [tcp] (TCP), User Datagram
Protocol [udp] (UDP) and a Transport Layer Security [tls] (TLS) implementation,
offered the possibility to evaluate the approach inside a realistic industrial environ-
ment. As the reconfiguration performance depends on the bandwidth provided to
the reconfiguration server a modern SFSCI evaluation smart-card from Samsung
has been used, which provided a USB connection channel to the terminal. In order
to allow communication using standard Internet protocols the Ethernet Emulation
Mode (EEM) class for USB needed to be implemented to tunnel Ethernet packets
over the USB channel.
The application scenario consisted of a smart-card web-server offering services to
store personal identities and authenticate against web services. Connections could

 evaluation
be established to the web-server using different protocol combinations. While most
communication channels used TLS over TCP and IPv other communication chan-
nels used the IPv protocol or utilized UDP without TLS. This broad set of different
control flows inside the protocol stack, based on the corresponding protocols used,
made this application a very suitable evaluation scenario. The complete binary size
of all binary objects inside the case study consisted of  bytes.
Before the reconfiguration runtime performance is evaluated the design time over-
head introduced by the reconfiguration tool-chain is analyzed in the following sec-
tion.
.. Design Time Overhead
A prototypic binary reconfiguration framework has been implemented in Java. It
offers a silent batch-mode for automatic modification of binary objects based on the
constraint set provided by the user. An interactive GUI is provided as well, to allow
the visualization and modification of extracted components if manual support is
needed. The binary analyzer and rewriter supports the ARMv ISA, including both
THUMB and ARM instructions. Additional support for other architectures can be
integrated easily, although the time needed to implement handlers for instructions
of a new ISA linearly scales with the complexity of the ISA. If an instruction is
not supported by a framework the binary analysis step will not be able to forward
substitute expressions containing the instruction.
The application used inside the evaluation scenario has been processed by the re-
configuration framework on a single core Linux computer with a , Ghz Pentium
processor. The Java Virtual Machine was provided with  GB of heap space. The
execution time of the design flow steps can be seen in Table . The most time-
consuming part with about  seconds is the component optimization step, which
includes the calculation of the worst case reconfiguration delay for every design point
of the design space exploration process. The Data-Flow Analysis, which annotates
the CFG with high level constraints and resolves indirect branches, is the second
most time-consuming step.
All together the complete execution time of the framework stayed under three min-
utes, which is a reasonable time frame.
. case study - smartcard ipstack 
Design Flow Step Execution Time
Header Analysis  ms
CFG Generation  ms
DF Analysis  ms
Constraint Checking  ms
Component Identification  ms
Binary Rewriting  ms
Component Optimization  ms
Table : Execution time of the design flow steps for the example scenario.
.. Reconfiguration Manager Binary Overhead
As the reconfiguration itself adds new executable code to the original binary, it is
very important to keep this additional code as small as possible. The implementa-
tion of the reconfiguration manager including the interface implementation and the
replacement function added  bytes of code to the application. Inside the example
scenario the communication stack of the operating system could be reused resulting
in a small reconfiguration manager.
The instrumentation code of a single handler added to the binary code varies be-
tween  and  bytes in size depending on the type of control flow.
.. Component Extraction
The XML configuration file contained the constraints shown in Listing . They
were passed to the constraint solver with the goal to extract the TCP, IPv and
TLS components from the application in order to reuse these components inside
the reconfiguration process. Line two and four describe a constraint to identify the
control flow to the TCP component, line six specifies the control flow to the IPv
component and the symbol constraint in line eight describes an entry point to the
TLS component. Table  shows the size of the extracted components Si after using
Algorithm  and the component merging process as described in Section ...
 evaluation
Component Component Size Complete Size Percentage
S1 (TLS)   , %
S2 (IPv)   , %
S3 (TCP)   , %
Table : Extracted component sizes in bytes after the component merging process.
 /* IPv4 to TCP control flow */
 [ip4_input]
 (ip4_hdr._ttl_proto & 0xff) != 0x6

 /* ETH to IPv6 control flow */
 [ethernet_input]
 eth_hdr.type != 0x86dd

 /* IPv6 to TCP control flow */
 [ip6_input]
 ip6_hdr.nexthdr != 0x6

 /* Globally valid constraints */
 [__global__]
 @tls_recv
Listing : Constraint set used for the evaluation example to extract the
IPv, TCP and TLS Components.
Using this simple constraint set, it was possible to extract  percent of the TLS
implementation code to be used inside a reconfiguration component. The remaining
bytes of the implementation may be extracted with a more sophisticated constraint
set as not all control flows are covered by the set of Listing . A similar statement
holds true for the TCP and IPv components in Table  for which the percentage is
lower. This is due to the fact that the constraints only restrict control flow from the
lower Ethernet packet layer. Control flow from higher layers, as, e.g., the application
layer, was not considered by the constraint set. Adding them is, however, possible
without restriction.
. case study - smartcard ipstack 
Time Interval WC Value (µs)
trm 
tx 
tr 
terase (per page) 
tissue 
tos 
tack 
tdata (per byte) 
Table : System Timings for the evaluation scenario running on the SFSCI smart-card
with an ARMvt processor at  Mhz Clock Frequency.
.. Reconfiguration Delay Function
In order to do the design space exploration as part of the component optimization
the system timings, as described by the reconfiguration protocol in Section ., need
to be known. The timings have been measured by executing the worst case path. The
OS timings may not be interpreted as precise estimates as the evaluation scenario
was running a Linux OS without Real-Time support. However, the Internet Protocol
stack implementation on the smart-card is completely deterministic. The measured
execution times on the smart-card are, thus, deterministic.
In the following let dh be the amount of bytes used by the lower layer protocols for
each reconfiguration packet transfered and sp the maximum amount of data bytes
transferred for every packet. The Reconfiguration Delay Function for the smart-
card application, used for calculating the worst case reconfiguration delay during
the design space exploration, is then given as follows:
κc : E
∗ →N =
∑
Si∈E
LRU(Si) · (tstatic + tfull + tresidue)
with the equations
tstatic = trm + tx + tissue
 evaluation
tfull =
⌊
si
sp
⌋
· (tos + (dh + sp) · tdata + tr + terase · sp
Pf
+ sp + tx + tack)
tresidue = (tos+(dh+ sresidue) · tdata+ tr+ terase ·
⌈
sresidue
Pf
⌉
+ sresidue+ tx)
sresidue = si mod sp
given from Section ..
The reconfiguration delay function κc sums up the delays caused by every component
that needs to be loaded. Whether a component is loaded or not is simulated by the
LRU function. The function returns 1, whenever the component is not currently
loaded, 0 otherwise. The loading of a component is based on the reconfiguration
protocol given in Section .. It is split into three parts. The static part tstatic
resembles the time needed to send the reconfiguration request to the server. The
time interval tfull measures the time needed to transfer the sisp number of complete
reconfiguration packets with maximum size. The last time interval resembles the
time needed to transfer the residue of the component to the smart-card.
The corresponding worst case time intervals have been measured using the hardware
clock of the smart-card and are listed inside Table . The value terase belongs to
the process of reprogramming a single flash page. Every page needs to be erased
before it can be rewritten. The time it takes to erase a page is given by terase. The
time interval tstore is then given as the combination of erasing a page and writing
the data back into the page.
The most time-consuming part, as seen in the table, is the time used for erasing
and writing a single flash page. This can also be seen in Figure  which depicts the
development of the blocking time based on the component size for a single compo-
nent to be loaded. The flash size Pf for the smart-card was  bytes. Components
are transferred in packets of maximum  bytes in size. As depicted in the figure
the blocking time Kc linearly increases between multiples of the flash page size. For
every new flash page that needs to be programmed the blocking time increases by ap-
proximately  ms. A partition function θ, thus, should try to minimize the number
of the erase/write cycles needed for a component as it is the most time-consuming
part of the reprogramming step.
Figure  illustrates the error between the worst case blocking time function κc
and the measured values on the smart card. The values have been measured fifty
times each. The corresponding worst case value is shown as the small red dot. The
. design space exploration 
512 768 1024 1280 1536 1792 2048
0
20
40
60
80
100
120
140
Component Size si (Bytes)
Bl
oc
ki
ng
 T
im
e 
K c
 
(m
s)
t
residue
tfull
K
c
Figure : Development of the blocking time κc in µs and its parts tfull and tresidue
for different component sizes with sp = 512,Pf = 256.
lower deviation is shown as an error-bar below. As shown the worst case blocking
time function stay well above the real measured values. The percentile difference
between the worst case function and the measured ones is illustrated in Figure .
While the absolute error grows linearly due to multiple errors adding up, the overall
percentile error stays between  and  percent even for increasing component sizes.
The function κc used for this evaluation, thus, is a reasonable estimation of the
worst blocking time for a reconfiguration context. In the following this function is
used for the design space exploration phase covered in the next section.
. design space exploration
The design space exploration was done on the components of Table . For the
purpose of component partitioning the function θobj as described in Section .
has been used. The reconfiguration protocol for the transmission of the component
parts was implemented on top of UDP resulting in a header overhead of dh =
44 bytes. The packet size was limited to sp = 512 bytes of data, resulting in a
 evaluation
512 768 1024 1280 1536 1792 2048 2200
0
20
40
60
80
100
120
140
Componen Size si (Bytes)
Bl
oc
ki
ng
 T
im
e 
K c
 
(m
s) K
c
Figure : Comparison of the worst case blocking time function κc in µs and the mea-
sured values for different component sizes with sp = 512,Pf = 256. The lower
deviation is shown for the red (measured) values.
maximum of two flash pages per packet. The use of the unreliable UDP protocol for
communication did not have any influence on the reliability of the reconfiguration
system as the communication channel was a direct point-to-point connection over
USB. No packets were lost. The reconfiguration protocol as described in Chapter 
additionally avoided overflows of the receive buffers of the smart-card by enforcing
the acknowledgment of every reconfiguration packet.
All calculated design points are listed inside the Appendix A.. The Pareto optimal
ones, however, are shown in Table . The calculation took  ms as shown
in Table . As expected the reconfiguration delay and the flash wearout increase
with decreasing memory space. However, some local minimum exists for each of the
parameter. The worst case reconfiguration delay for this configuration always occurs
for IPv packets, which are entering the TCP and afterwards the TLS component;
basically following the ISO/OSI protocol level flow.
. design space exploration 
512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
0
2
4
6
8
10
12
14
16
Component Size si
Er
ro
r i
n 
%
Figure : The error between the function κc and the measured values in percent.
The binary overhead, representing the mean percentile increase in footprint of the
components due to instrumentation code, and the overall decrease in size of the
complete application are shown as well.
For the tie breaker function the following values have been used:
λdr = 1.0,drmax = 5000000
λdw = 0.75,dwmax = 60
λPm = 0.75,Pmmax = 7000
In general a good value for drmax can be calculated using the schedulability analysis.
The supremum of the values for the worst case blocking time for which the system
is schedulable a good reference value. This ensures that all designs which are not
schedulable get a very low value. Smaller values for drmax , however, may be beneficial
in order to decrease the latency of the system. This is incorporated into the tie
breaker function by the value λdr . A maximum blocking time of  seconds has been
used inside this tie breaker function. Lower blocking times are valued higher than
the flash wear-out and the total memory used. The flash wear-out and the memory
usage has been valued equally. This is reflected by λdr > λdw = λPm . Using these
 evaluation
Pm Pc dr(µs) dw Binary Overhead Size Decrease fv
    , ,% ,
    , ,% ,
    , ,% ,
    , ,% ,
    , ,% ,
    , ,% ,
    , ,% ,
    , ,% ,
    , ,% ,
    , ,% ,
    , ,% ,
    , ,% ,
    , ,% ,
    , ,% ,
    , ,% ,
Table : Pareto optimal design points (Pm,Pc) of the design space exploration for the
components of Table  over the parameter dr,dw,Pm. Some additional infor-
mation on the design points as the binary overhead and the overall size decrease
of the system are listed as well.
values the maximum value fv is obtained by using the design Pm = 4608,Pc = 1536,
which is highlighted inside Table .
The design has been executed on the smart-card with the reconfiguration server
running on the connected terminal. A series of HTTP GET requests using TLS for
secure transportation has been sent to the smart-card. The blocking time due to
reconfiguration, in order to correctly process the request on the smart card, has
been measured and is depicted in Figure . As depicted inside the figure the first
request encounters a blocking time of approximately 1.4 seconds, staying well below
the calculated worst case blocking time κ = 1.54 seconds. The successive requests,
however, encounter a shorter blocking time as some of the components needed to
process the request are already loaded. The reconfiguration component slot cache is,
thus, hot. The reconfiguration blocking time stayed well under its estimation κ for
. design space exploration 
1 1.5 2 2.5 3 3.5 4
0
200
400
600
800
1000
1200
1400
1600
1800
X: 2
Y: 982
X: 3
Y: 962
X: 4
Y: 968
X: 1
Y: 1424
Request No.
Bl
oc
ki
ng
 T
im
e 
in
 m
s
κ
Measurement
Figure : Measurement of the blocking time in ms for a HTTP GET request on the
SFSCI smart card using the reconfiguration approach and design Pm =
4608,Pc = 1536.
all requests. The overall size of the software running on the smart-card decreased
by % with a reasonable increase in response time of one second in the mean. For
most smart-card applications this will be justifiable. Using different ratings for the
design parameters, however, suitable designs can be found for most scenarios.
The next important aspect of the design space exploration is the guarantee to
meet some minimum lifetime requirement of the smart card. Every reconfiguration
rewrites a certain number of flash pages. If the maximum number of flash rewrite
cycles has been reached for a specific flash page the system will not be able to oper-
ate correctly any more. Thus, choosing a design with a sufficiently small value dw
can be important. Given the maximum number of flash rewrite cycles of a memory
page as fmax and a minimum inter-arrival time of reconfiguration issuing request
to the smart-card tmin the lifetime T of the system under reconfiguration is given
as:
 At least the corresponding flash page cannot be used any more. An additional memory management
algorithm may be used to handle this situation. However, the performance of the system will
decrease continuously with every additional flash page that cannot be used any more.
 evaluation
tmin(minutes) T (min) T (hours) T (days)
,  , ,
  , ,
  , ,
  , ,
  , ,
   ,
Table : Lifetime of the Pareto optimal design with a maximum number of flash rewrites
of fmax = 1000000 and a flash wearout of dw = 8.
T =
fmax
dw
· tmin
Using this formula the maximum lifetime of the example scenario has been evaluated
using different values for the minimal inter arrival time of requests to the smart
card. As the value for the maximum number of rewrite cycles for the flash memory
fmax = 1000000 has been chosen. Corresponding values can be taken from the
hardware specification of the flash manufacturer. Table  lists the result. Depending
on the minimal inter-arrival time the system lifetime under reconfiguration can be
as small as multiple days or as large as multiple years. Given a minimal required
lifetime a good value for the design space parameter dwmax can be obtained by
solving the equation above for dwmax :
dwmax =
fmax
T
· tmin
. summary
In this chapter a case study of the reconfiguration methodology was performed using
a realistic smart card scenario. The reconfiguration was applied to a smart card
webserver implementation featuring a full TCP/IP stacks. The study was designed
to determine the effect of applying the reconfiguration approach to the system with
respect of the possible footprint savings. Using the constraint based component
identification approach proposed in this thesis it was possible to extract a sufficiently
. summary 
big part of the implementation as reconfigurable components using a very simple
constraint set. The evaluation scenario was quite extreme as nearly all functionality
used to process a packet completely was extracted from the system. Anyway, using
the parameters given by the system developer it was possible to find a suitable design
which reduced the overall footprint of the system by %. Designs with a smaller
footprint have been possible, however, with a much higher worst case reconfiguration
blocking time.
The exemplary design space exploration, containing the calculation of the worst case
blocking time for every design, demonstrated the applicability of the approach. It
was shown that the statically estimated blocking time stayed safely above the real
blocking time, although, being close enough to be viable. It was also demonstrated
how suitable values for the tie breaker function can be calculated.

11
CONCLUS ION AND FUTURE WORK
This thesis work presents a complete methodology to allow an embedded application,
given as binary object code, to be transformed into a reconfigurable application
with the overall goal to reduce the binary footprint of the application. In contrast
to traditional reconfigurable systems the reconfiguration approach is designed to
decrease the footprint of a legacy application while retaining its functionality and
meeting its real-time constraints. Therefore a series of problems, some of them being
related to general binary analysis, had to be solved. The next section gives a short
summary of the thesis covering the most important problems and the solutions
applied.
. thesis summary
This thesis set out to determine in which way it is possible to use reconfiguration
mechanisms inside legacy embedded systems to decrease the binary footprint. The
proposed approach combines multiple steps as illustrated in Figure  in order to
achieve this goal. As mentioned inside the introduction three major questions had
to be answered by the approach proposed inside this thesis. In the following the
answers given inside this thesis are summed up.
• How can meaningful components be extracted from binary code if no source
code is available?
The approach solves the problem of having no access to the source code by
combining a series of well known binary analysis approaches to reconstruct
the semantics of the application. The approach, however, does not require a
complete reconstruction of the application’s source code. Even partial recon-
struction allows the reconfiguration framework to add reconfiguration support
to the parts extracted by the user. Given a set of bit vector constraints by
the user a constraint solver is used to find conditional control flows inside the

 conclusion and future work
Binary Analysis
Component 
Extraction
System 
Designer
Optimization
Binary 
Transformation
ELF Object
ELF Object
ELF Object
ELF Object
ELF Object
ELF Object
@tls_recv
eth_hdr.s > 4
....
SSL constraints
Figure : The steps of the reconfiguration methodology as proposed in this thesis.
application which do not fulfill the user constraints. These control flows are
then used as entry points to reconfigurable components. The evaluation demon-
strated that it is possible to extract semantically meaningful components from
binary code using even simple constraint sets without sophisticated knowledge
of the binary codes internals.
• How to derive a "good" design of the reconfiguration system depending on
static deployment parameters as, e.g., memory usage and worst case execution
time?
The components extracted by the user are subject to an iterative optimization
approach. Using different design parameters as, e.g., maximum component slot
size and maximum reconfiguration memory space a design space exploration is
done to find the Pareto optimal design. Using a context sensitive analysis of the
ICFG the worst case reconfiguration blocking time is calculated for every design.
Components are either merged or partitioned to correspond to the design
constraints. A final design decision is based on a tie breaker function which
selects the best suitable design based on some user given rating parameters.
• How to transform the original application into an application which supports
reconfiguration automatically?
. outlook 
In order to avoid manual adaptation the approach makes use of an indirection
layer between extracted components and the reconfiguration system. The bi-
nary code is automatically transformed by rewriting the binary object code
using instrumentation code added to the specific control flow nodes.
The present study makes several noteworthy contributions to the field of reconfig-
uration in very resource constrained embedded systems. In comparison to state of
the art approaches the presented approach concentrates on decreasing the footprint
of the overall system and has no no fixed unit of adaption. Components used by
the approach may be as small as one single instruction or as big as multiple object
files. It was designed and shown to be applicable for applications which cannot be
changed on source code level.
The designed framework assists the system developer by semi automatically con-
verting the original static binary code into a reconfigurable system. Only a small
intermediate layer needs to be added manually to the system, which has been shown
to be very small for the evaluation platform. The incorporation of different design
factors leads to a huge number of possible designs. Using a design space exploration
a suitable design for the system is automatically calculated, while important fac-
tors as the worst case blocking time of a system task and the flash lifetime are
incorporated into the analysis.
. outlook
A further study could assess the possibility of adding new components, which have
not been available at link time, using the reconfiguration approach inside this thesis.
This would allow for software upgrades of legacy code parts, as well as decrease the
development time by interchanging functionality at runtime on the fly. As compo-
nents can literally be defined at any location inside the binary code the interface
of such components needs to be defined by different means. The use of data flow
analysis facts for entry and exit transitions would probably allow for a sufficiently
detailed description of the interface to ensure the code integrity criteria.
Future research could also concentrate on developing a mechanism which assists the
user with the selection of suitable components from the legacy code. Using static pro-
filing techniques valuable information on the execution frequency of program parts
 not only at function boundaries
 conclusion and future work
may be calculated. Techniques have been proposed for this by Ihle [Ihl] to gener-
ate a basic block ordering based on the execution frequency. Less frequently used
basic blocks would be a preferred choice for a reconfiguration. However, additional
considerations would have to be made as selecting the least frequently executed
basic block does not necessarily mean it is the best choice with respect to the worst
case blocking time the system may encounter.
Additional improvements could be made to the optimization algorithm. A better
partitioning function θ could be developed, which tries to minimize the worst case
reconfiguration blocking time and/or other parameters. As the optimal partitioning
of components is NP-hard promising results could be achieved by using, e.g., genetic
algorithms. Tests with different partitioning functions indicate that the component
partitioning has a major influence on the performance of the approach, allowing for
better designs to be found.
Finally, the current approach assumes the use of only one reconfiguration manager.
Parallel reconfigurations are not allowed using this design. However, adding sup-
port for multiple reconfigurations in parallel is straight forward. A simple solution
could utilize different independent reconfiguration manager with independent recon-
figuration slots. Critical tasks may then be released from the interference of lower
critical/priority tasks, allowing for a much smaller worst case blocking time in ex-
change for a higher memory consumption or a higher worst case blocking time for
lower priority tasks. This is essentially the same as locking components in memory,
similar to memory pages being locked for real-time tasks in most operating systems.
Part IV
APPENDIX

A
APPENDIX
a. mathematical notation
This section briefly defines some fundamental mathematical notations as used inside
the literature in order to avoid any uncertainty.
Definition A.. (Tuples):
Let d1,d2, ..,dn ∈ D be elements from a domainD. The according n-tuple is written
as (d1,d2, ..,dn). An empty Tuple is written .
The domain of n-tuples is written Dn:
Dn := {(d1,d2, ..,dn)|di ∈ D}
Definition A.. (Kleene Closure):
Given a domain D the Kleene Closure is defined and denoted as:
D+ :=
⋃
n∈N
Dn
D∗ := D+ ∪ 
Definition A.. (Powerset):
The powerset of an set S, written P(S) is the set of all subsets of S, including the
empty set and set S itself.
Definition A.. (Directed Graph):
A directed graph is a tuple (N,E) with N being a set of nodes (or vertices) and E
the set of directed edges. An edge e ∈ E = (n1,n2) describes an edge going from
node n1 to node n2.
Definition A.. (Path):
A path from n1 ∈ N to n2 ∈ N in a graph G = (N,E), represented as n1 → n2, is
a sequence of edges (n1,n2), ..., (nm−1,nm) ∈ E,m > 1.

 appendix
Definition A.. (Subgraph):
A subgraph of a graph G = (N,E) is a graph G ′ = (N ′,E ′) with N ′ ⊂ N,E ′ ⊂ E ′.
If G ′ contains exactly the same edges between each pair of nodes as G the graph
G ′ is called the vertex-induced subgraph of G.
Definition A.. (Predecessor):
Given a directed graph G = (N,E) the set pred(nj) is the set of all nodes ni with
(ni,nj) ∈ E.
a. additional path enumeration considerations
During the path enumeration step loops are unrolled once to ensure all reconfigu-
ration context, which may be generated at runtime, are covered by the worst case
blocking time analysis. This is a brute-force method which can end up being very ex-
pensive. While simple loops with only one possible path inside the loop increase the
length of a path linearly without introducing additional paths, loops containing one
conditional branch already double the number of paths starting at the loop header,
therefore increasing the breadth of the search tree. This situation is depicted in Fig-
ure . The number of paths created by one loop unrolling step equals the number
of paths contained inside the loop. The additional paths contain all combinations
of paths that may taken during one addition loop traversal. This ensures all possi-
ble paths are contained inside the analysis. In general let p1, ...,pn be the distinct
paths inside a loop, and |pi| be its length. Then all n paths created by the loop
unrolling step are of length
∑n
i=1 |pi|. Unrolling additional loop iterations leads to
an exponential increase in the number of paths. Unrolling a simple loop containing
one conditional branch  times leads to 2100 additional paths. The explosion on
this breadth of the search tree makes an analysis very expensive.
There exists a huge set of approaches for calculating the WCET of an application
which try to tackle this breadth explosion of complex loops. Chu [CJ] uses com-
pounded summarization to reduce the complexity of path unrolling by extrapolating
the information of a single branch of an unrolled loop. Other approaches try to derive
precise loop bounds by static analysis to limit the breadth of the search tree while
trying to remove false paths [Alt, EG, GEL] utilizing abstract interpretation.
The explosion is the reason why the worst case blocking time calculation approach
only considers one iteration of a loop and stops traversing the context sensitive
A. additional path enumeration considerations 
a
b c
d
a
b c
d
a
c
d
d
a
b
d
a) b)
Figure : A loop containing a conditional branch unrolled with the loop removed after
unrolling.
graph if too many components are contained inside a loop. As only one iteration
of the loop is considered, the approach does not suffer from an exponential increase
in the number of paths as WCET analysis approaches typically suffer from. The
blocking time analysis, thus, trades calculation feasibility against generality. During
the evaluation, however, no situation occurred which lead to a termination of the
algorithm because of loops containing too many components. If this turns out to be
a serious problem for some applications, related techniques used for calculating the
WCET may be used.
In contrast to state of the art path enumeration techniques used to calculate the
WCET of an application, only paths which contain multiple components need to
be unrolled for the calculation of the blocking time. Loops containing no or only
one component do not need to be unrolled as they can not increase the worst case
blocking time as only edges between components can introduce blocking times. The
framework automatically checks this condition and stops the traversal of a loop if
this condition is not met.
 too many in the sense that additional loop iterations could lead to a preemption of other compo-
nents used inside the loop (see Section .)
 appendix
a. calling convention (RC_ABI)
The reconfiguration manager provides the enter_comp routine, which allows com-
ponents or the operating system to call code inside other components. The calls are
automatically added as instrumentation code to the operating system and extracted
and optimized reconfiguration components. The call is not EABI conform and can,
thus, not be triggered in a higher level programming language. In the following the
calling convention for the routine implemented for the ARM ISA is described.
Figure  illustrates the calling convention upon calling the enter_comp method.
The stacks needs to contain the ID of the calling component at position sp+#,
the original register r1 at sp+# and the original register r0 at the current stack
pointer address. Register r1 contains the offset to the called component (upper 
bit), one bit to indicate if this is a function call and the component to be called (
bit). The register r is utilized for the indirect branch to the enter_comp method
and, thus, contains the address of the method.
return comp #id
original r1
original r0
old stack slot
....
free slot
free slot
sp
sp + #4
sp + #8increasing
memory
addresses
sp + #0
...
offset bl config_id
1 bit 7 bit
r0
enter_compr1
32 bit
24 bit
Figure : ABI of the reconfiguration manager providing the reconfiguration indirection
mechanism.
A. calling convention (RC_ABI) 
The reconfiguration approach needs additional  bytes on the stack to implement
the indirection mechanism. The size of the instrumentation code varies on the control
flow type. The instrumentation code, which is added to the binary automatically, is
depicted in the following listings. Please be aware that alignment restrictions need
to be met in order to insert any of these code blocks.
push {r0,r1,r2}
ldr r0, [pc,#4]
ldr r1, [pc,#4]
bx r1
.word 0x0080001 // offset | bl flag | config_id
.word enter_config // reconfiguration manager symbol
Listing : THUMB indirection to another component without return-
ing.
push {r0,r1,r2}
mov r0, #myid
str r0, [sp, #8]
ldr r0, [pc,#4]
ldr r1, [pc,#4]
bx r1
.word 0x0080001
.word enter_config
Listing : THUMB indirection to another component with return.
sub sp, #8
str r0, [sp,#0]
ldr r0, [pc, #4]
str r0, [sp, #4]
pop {r0,pc}
nop
.word address
Listing : THUMB indirection to Mandatory Code without return.
 appendix
push {r0,r1,r2,r3}
nop
ldr r2, [pc,#4]
mov r0, #myid
ldr r1, [pc,#4]
bx r1
.word address
.word bl_to_abs
Listing : THUMB indirection to Mandatory Code with return.
ldr pc, [pc, #-4]
.word address
Listing : ARM indirection to Mandatory Code without return.
push {r0,r1,r2,r3}
ldr r2, [pc,#8]
mov lr, #myid
ldr r1, [pc,#4]
bx r1
.word address
.word bl_to_abs
Listing : ARM indirection to Mandatory Code with return.
A. system constraint language abnf 
a. system constraint language abnf
ALPHA = %x41-5A / %x61-7A; # A-Z / a-z
DIGIT = %x30-39; #0-9
DEC_DIGITS = DIGIT *(DIGIT);
HEX_DIGIT = %x30-39 / %x61-66; #0-9 / a-f
HEX_DIGITS = HEX_DIGIT *(HEX_DIGIT);
NUMBER = "0x" HEX_DIGITS / "#" DEC_DIGITS;
CRLF = %x0d %x0a / %x0a;
COMMENT_CHAR = %x20-29 / %x2B-2E / %x30-5A / %x5E-FE;
COMMENTS = *(CRLF) "/*" *(COMMENT_CHAR) "*/" *(CRLF);
comparison_op = "<" / ">" / "<=" / ">=" / "!=" / "=";
comparison_operation = *(%x20) comparison_op *(%x20);
binop = "+" / "-" / "*" / "/" / "&" / "|" / " xor " / "<<" / ">>" / "@";
binary_operation = *(%x20) binop *(%x20);
unaryop = "~";
extractop = "[" NUMBER ":" NUMBER "]";
identifier = ALPHA *(ALPHA / DIGIT / "_" / "." );
start = constraint_rule *(CRLF / %x20) start / constraint_rule;
constraint_rule = *(CRLF / %x20) [COMMENTS] *(CRLF / %x20)
"[" identifier "]" *(CRLF / %x20) constraint_set;
constraint_set = constraint CRLF constraint_set / constraint;
expression = "(" expression binary_operation expression ")" /
unaryop expression / "("expression extractop")" / NUMBER;
constraint = identifier_expression comparison_operation
identifier_expression / "@" identifier;
identifier_expression = identifier / "(" identifier_expression
binary_operation identifier_expression ")" /
unaryop identifier_expression / "(" identifier_expression extractop ")"
/ NUMBER ;
Listing : ABNF of the constraint input language
 appendix
a. evaluation design points
Pm Pc Binary Overhead dr(µs) dw
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
Table : Design points of the design space exploration of the evaluation scenario.
A. evaluation design points 
Pm Pc Binary Overhead dr(µs) dw
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
  .  
Table : Continuation of Table 
 appendix
N Z C V Reserved
19 162023242728293031
IT M
4 0567
E A I F T
891015
Figure : The Program Status Register of a ARM processor. Empty bit fields are pro-
cessor specific and are left out for abstraction.
a. the armv(t) isa
The ARMv ISA is a  bit instruction set architecture. A  bit instruction set exists
and is called THUMB (ARMvt). In any of these two instruction sets the instruction
can operate on the following registers. The ARM processor core registers consist of:
• thirteen general-purpose  bit registers, r0 to r12
• three -bit registers for special use, r13 to r15
The register r13 to r15 are used for the following uses.
• SP (r13), the stack pointer: The register r13 is used as a pointer to the current
stack location in memory.
• LR (r14), the link register: The register r14 stores the return address from
routines.
• PC (r15), the program counter: The register r15 stores the address of the
currently executing instruction.
The Program Status Register (PSR) is a special purpose register, which can only
be accessed by special instructions. Its format is depicted in Figure . It contains
the condition flags and the operation mode flags. The following table sums up the
semantics of each bitfield. Depending on the processor mode M the access to some
of the bitfields are restricted and can trigger an interrupt.
As indicated by the T bit of the PSR most ARM cores also feature a Thumb in-
struction set which contains a set of  bit instructions in addition to the  bit
instructions of the normal ARM operation mode.
A. the armv(t) isa 
Field Semantic
N Negative condition code flag
Z Zero condition code flag
C Carry condition code flag
V Overflow condition code flag
IT If-Then execution bits for the THUMB IT instruction
E Endianness execution state bit. =Little endian, =Big endian operation
A Asynchronus abort interrupt disable bit
I Interrupt disable bit
F Fast interrupt enable bit
T Thumb execution state bit. =Thumb mode activated
M Processor Mode
Table : The semantics of the PSR bitfields.
Table  gives an overview over the ARM ISA mnemonics referenced throughout
this thesis. A complete list can be found in [ARMa].
 appendix
M
ne
m
on
ic
Se
m
an
ti
c
m
ov
r d
s
t
,r
s
r
c
M
ov
es
th
e
co
nt
en
ts
of
re
gi
st
er
r s
r
c
in
to
th
e
re
gi
st
er
r d
s
t
ad
d
r d
s
t
,r
s
r
c
,#
im
m
St
or
es
th
e
re
su
lt
of
r s
r
c
+
#i
m
m
in
re
gi
st
er
r d
s
t
su
b
r d
s
t
,r
s
r
c
,#
im
m
M
ne
m
on
ic
fo
r
ad
d
r d
s
t
,r
s
r
c
,-
#
im
m
ld
r
r d
s
t
,[
r s
r
c
,r
o
f
f
]
Lo
ad
s
th
e
w
or
d
at
m
em
or
y
lo
ca
ti
on
[r
s
r
c
+
r o
f
f
]
in
to
r d
s
t
ld
r
r d
s
t
,[
r s
r
c
,#
im
m
]
Lo
ad
s
th
e
w
or
d
at
m
em
or
y
lo
ca
ti
on
[r
s
r
c
+
#i
m
m
]
in
to
r d
s
t
ld
rb
r d
s
t
,[
r s
r
c
,r
o
f
f
]
Lo
ad
s
th
e
by
te
at
m
em
or
y
lo
ca
ti
on
[r
s
r
c
+
r o
f
f
]
in
to
r d
s
t
ld
rb
r d
s
t
,[
r s
r
c
,#
im
m
]
Lo
ad
s
th
e
by
te
at
m
em
or
y
lo
ca
ti
on
[r
s
r
c
+
#i
m
m
]
in
to
r d
s
t
ld
rh
r d
s
t
,[
r s
r
c
,r
o
f
f
]
Lo
ad
s
th
e
ha
lfw
or
d
at
m
em
or
y
lo
ca
ti
on
[r
s
r
c
+
r o
f
f
]
in
to
r d
s
t
ld
rh
r d
s
t
,[
r s
r
c
,#
im
m
]
Lo
ad
s
th
e
ha
lfw
or
d
at
m
em
or
y
lo
ca
ti
on
[r
s
r
c
+
#i
m
m
]
in
to
r d
s
t
st
r
r d
s
t
,[
r s
r
c
,r
o
f
f
]
St
or
es
th
e
w
or
d
va
lu
e
[r
s
r
c
+
r o
f
f
]
to
m
em
or
y
lo
ca
ti
on
[r
d
s
t
]
st
r
r d
s
t
,[
r s
r
c
,#
im
m
]
St
or
es
th
e
w
or
d
va
lu
e
[r
s
r
c
+
#i
m
m
]
to
m
em
or
y
lo
ca
ti
on
r d
s
t
st
rb
r d
s
t
,[
r s
r
c
,r
o
f
f
]
St
or
es
th
e
by
te
va
lu
e
[r
s
r
c
+
r o
f
f
]
to
m
em
or
y
lo
ca
ti
on
r d
s
t
st
rb
r d
s
t
,[
r s
r
c
,#
im
m
]
St
or
es
th
e
by
te
va
lu
e
[r
s
r
c
+
#i
m
m
]
to
m
em
or
y
lo
ca
ti
on
r d
s
t
st
rh
r d
s
t
,[
r s
r
c
,r
o
f
f
]
St
or
es
th
e
ha
lfw
or
d
va
lu
e
[r
s
r
c
+
r o
f
f
]
to
m
em
or
y
lo
ca
ti
on
r d
s
t
st
rh
r d
s
t
,[
r s
r
c
,#
im
m
]
St
or
es
th
e
ha
lfw
or
d
va
lu
e
[r
s
r
c
+
#i
m
m
]
to
m
em
or
y
lo
ca
ti
on
r d
s
t
bx
r d
s
t
Se
ts
th
e
pr
og
ra
m
po
in
te
r
to
th
e
va
lu
e
of
re
gi
st
er
r d
s
t
bl
#
im
m
Se
ts
th
e
pr
og
ra
m
po
in
te
r
to
#
im
m
an
d
st
or
es
pe
rf
or
m
s
m
ov
lr
,p
c
T
ab
le

:
T
he
A
R
M
m
ne
m
on
ic
s
re
fe
re
nc
ed
in
th
is
th
es
is
.F
or
a
co
m
pl
et
e
lis
t
of
al
lA
R
M
/T
H
U
M
B
m
ne
m
on
ic
s
se
e
[A
R
M

a]
.
AUTHORS RELATED PUBL ICAT IONS
[BBK+] Markus Becker, Daniel Baldin, Christoph Kuznik, M. tech. Mabel Mary
Joy, Tao Xie, and Wolfgang Mueller. Xemu: An efficient qemu based bi-
nary mutation testing framework for embedded software. EMSOFT:
Teenth ACM International Conference on Embedded Software  Pro-
ceedings, . (Cited on page .)
[BGKO] Daniel Baldin, Stefan Groesbrink, Timo Kerstan, and Simon Oberthuer.
Towards constraint-based binary code optimization using annotated con-
trol flow graphs. In nd Annual International Conference on Advances
in Distributed and Parallel Computing, number  in Proceedings of the
nd International Conference on Advances in Distributed and Parallel
Computing (ADPC), pages –. Global Science and Technology Fo-
rum, September . (Cited on page .)
[BGOa] Daniel Baldin, Stefan Groesbrink, and Simon Oberthuer. Reconfigura-
tion of legacy software artifacts on resource constraint smart cards (Best
Paper Award). In Proceedings of the Second International Conference
on Mobile Services, Resources and Users, MOBILITY, pages –.
ThinkMind, . (Cited on pages  and .)
[BGOb] Daniel Baldin, Stefan Groesbrink, and Simon Oberthür. Enabling
constraint-based binary reconfiguration by binary analysis. GSTF Jour-
nal on Computing (JoC), ():–, January . (Cited on page .)


BIBL IOGRAPHY
[Alt] Peter Altenbernd. On the false path problem in hard real-time
programs. In In Proceedings of the th Euromicro Workshop on
Real-time Systems, pages –, . (Cited on page .)
[ARMa] ARM Ltd. ARM Architecture Reference Manual, . (Cited on
pages xvi, , , and .)
[ARMb] ARM Ltd. Base Platform ABI. http://infocenter.arm.com/
help/topic/com.arm.doc.ihi0037b/IHI0037B_bpabi.pdf, .
(Cited on page .)
[ARMc] ARM Ltd. C++ ABI. http://infocenter.arm.com/help/topic/
com.arm.doc.ihi0041c/IHI0041C_cppabi.pdf, . (Cited on
page .)
[ARMd] ARM Ltd. C Library ABI. http://infocenter.arm.com/help/
topic/com.arm.doc.ihi0039b/IHI0039B_clibabi.pdf, .
(Cited on page .)
[ARMe] ARM Ltd. ELF for the ARM Architecture. http:
//infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/
IHI0044D_aaelf.pdf, . (Cited on pages  and .)
[ARMf] ARM Ltd. Procedure Call Standard for the ARM Archi-
tecture. http://infocenter.arm.com/help/topic/com.arm.doc.
ihi0042d/IHI0042D_aapcs.pdf, . (Cited on page .)
[ARMg] ARM Ltd. Run-time ABI. http://infocenter.arm.com/help/
topic/com.arm.doc.ihi0043c/IHI0043C_rtabi.pdf, . (Cited
on page .)

 bibliography
[ASU] Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: prin-
ciples, techniques, and tools. Addison-Wesley Longman Publishing
Co., Inc., Boston, MA, USA, . (Cited on pages , , and .)
[AT&] AT&T. System V Application Binary Interface v...
http://infocenter.arm.com/help/topic/com.arm.doc.
ihi0042d/IHI0042D_aapcs.pdf, . (Cited on page .)
[Aug] David Isaac August. Hyperblock performance optimizations for ilp
processors, . (Cited on page .)
[BAR] Yosi Ben Asher and Nadav Rotem. The effect of unrolling and
inlining for python bytecode optimizations. In Proceedings of SYS-
TOR : The Israeli Experimental Systems Conference, SYSTOR
’, pages :–:, New York, NY, USA, . ACM. (Cited on
page .)
[BBK+] Markus Becker, Daniel Baldin, Christoph Kuznik, M. tech. Ma-
bel Mary Joy, Tao Xie, and Wolfgang Mueller. Xemu: An efficient
qemu based binary mutation testing framework for embedded soft-
ware. EMSOFT: Teenth ACM International Conference on Em-
bedded Software  Proceedings, . (Cited on page .)
[BCA+] Gordon S. Blair, Geoff Coulson, Anders Andersen, Lynne Blair,
Michael Clarke, Fabio Costa, Hector Duran-Limon, Tom Fitz-
patrick, Lee Johnston, Rui Moreira, Nikos Parlavantzas, and Katia
Saikoski. The design and implementation of open orb . IEEE DIS-
TRIBUTED SYSTEMS ONLINE, :, . (Cited on page .)
[BCF+] Michael G. Burke, Jong-Deok Choi, Stephen Fink, David Grove,
Michael Hind, Vivek Sarkar, Mauricio J. Serrano, V. C. Sreedhar,
Harini Srinivasan, and John Whaley. The jalapeno dynamic optimiz-
ing compiler for java. In Proceedings of the ACM  conference
on Java Grande, JAVA ’, pages –, New York, NY, USA,
. ACM. (Cited on page .)
[BDD+] J. Bergeron, M. Debbabi, J. Desharnais, M. M. Erhioui, Y. Lavoie,
and N. Tawbi. Static detection of malicious code in executable
programs. Int. J. of Req. Eng, . (Cited on page .)
bibliography 
[BDEK] J. Bergeron, Mourad Debbabi, M. M. Erhioui, and Béchir Ktari.
Static analysis of binary code to isolate malicious behaviors. In
Proceedings of the th Workshop on Enabling Technologies on In-
frastructure for Collaborative Enterprises, WETICE ’, pages –
, Washington, DC, USA, . IEEE Computer Society. (Cited
on page .)
[Bel] Fabrice Bellard. Qemu, a fast and portable dynamic translator. In
Proceedings of the annual conference on USENIX Annual Techni-
cal Conference, ATEC ’, pages –, Berkeley, CA, USA, .
USENIX Association. (Cited on page .)
[BGKO] Daniel Baldin, Stefan Groesbrink, Timo Kerstan, and Simon
Oberthuer. Towards constraint-based binary code optimization us-
ing annotated control flow graphs. In nd Annual International Con-
ference on Advances in Distributed and Parallel Computing, num-
ber  in Proceedings of the nd International Conference on Ad-
vances in Distributed and Parallel Computing (ADPC), pages –
. Global Science and Technology Forum, September . (Cited
on page .)
[BGOa] Daniel Baldin, Stefan Groesbrink, and Simon Oberthuer. Recon-
figuration of legacy software artifacts on resource constraint smart
cards. In Proceedings of the Second International Conference on Mo-
bile Services, Resources and Users, MOBILITY, pages –.
ThinkMind, . (Cited on pages  and .)
[BGOb] Daniel Baldin, Stefan Groesbrink, and Simon Oberthür. Enabling
constraint-based binary reconfiguration by binary analysis. GSTF
Journal on Computing (JoC), ():–, January . (Cited on
page .)
[BN] David Brumley and James Newsome. Alias analysis for assembly.
Technical report, Carnegie Mellon University School of Computer
Science, . (Cited on page .)
[BTS] Florian Brandner, Tommy Thorn, and Martin Schoeberl. Embedded
jit compilation with cacao on yari. In Proceedings of the  IEEE
International Symposium on Object/Component/Service-Oriented
 bibliography
Real-Time Distributed Computing, ISORC ’, pages –, Wash-
ington, DC, USA, . IEEE Computer Society. (Cited on
page .)
[But] Giorgio C. Buttazzo. Hard Real-time Computing Systems: Pre-
dictable Scheduling Algorithms And Applications (Real-Time Sys-
tems Series). Springer-Verlag TELOS, Santa Clara, CA, USA, .
(Cited on pages  and .)
[CC] Vitaly Chipounov and George Candea. Enabling sophisticated anal-
yses of x binaries with RevGen. In International Conference
on Dependable Systems and Networks Workshops, . (Cited on
page .)
[CEa] Cristina Cifuentes and Mike Van Emmerik. Recovery of jump table
case statements from binary code. In Proceedings of the th Inter-
national Workshop on Program Comprehension, IWPC ’, pages
–, Washington, DC, USA, . IEEE Computer Society. (Cited
on pages  and .)
[CEb] Cristina Cifuentes and Mike Van Emmerik. Recovery of jump table
case statements from binary code. In Science of Computer Program-
ming, pages –, . (Cited on page .)
[cho] choco Team. choco: an Open Source Java Constraint Programming
Library. Research report --INFO, École des Mines de Nantes,
. (Cited on page .)
[Cif] Cristina Cifuentes. Reverse Compilation Techniques. Phd thesis,
Queensland University of Technology, Brisbane, Australia, .
(Cited on pages  and .)
[CJ] Duc-Hiep Chu and Joxan Jaffar. Symbolic simulation on compli-
cated loops for wcet path analysis. In Proceedings of the ninth ACM
international conference on Embedded software, EMSOFT ’, pages
–, New York, NY, USA, . ACM. (Cited on page .)
[CLS] MichałCierniak, Guei-Yuan Lueh, and James M. Stichnoth. Prac-
ticing judo: Java under dynamic optimizations. SIGPLAN Not.,
bibliography 
():–, May . (Cited on page .)
[CS] C. Cifuentes and S. Sendall. Specifying the semantics of machine
instructions. In Proceedings of the th International Workshop on
Program Comprehension, IWPC ’, pages –, Washington, DC,
USA, . IEEE Computer Society. (Cited on pages  and .)
[CSF] Cristina Cifuentes, Doug Simon, and Antoine Fraboulet. Assembly
to high-level language translation. In In Int. Conf. on Softw. Maint,
pages –. IEEE-CS Press, . (Cited on pages , , ,
and .)
[CvdB] Ramkumar Chinchani and Eric van den Berg. A fast static analysis
approach to detect exploit code inside network flows. In Alfonso
Valdes and Diego Zamboni, editors, Recent Advances in Intrusion
Detection, volume  of Lecture Notes in Computer Science, pages
–. Springer Berlin / Heidelberg, . (Cited on page .)
[Dav] Martin Davis. Computability and unsolvability. . (Cited on
page .)
[DBDSVP+] Bruno De Bus, Bjorn De Sutter, Ludo Van Put, Dominique Chanet,
and Koen De Bosschere. Link-time optimization of arm binaries.
In Proceedings of the  ACM SIGPLAN/SIGBED conference
on Languages, compilers, and tools for embedded systems, LCTES
’, pages –, New York, NY, USA, . ACM. (Cited on
page .)
[DFEV] Adam Dunkels, Niclas Finne, Joakim Eriksson, and Thiemo Voigt.
Run-time dynamic linking for reprogramming wireless sensor net-
works. In Proceedings of the th international conference on Embed-
ded networked sensor systems, SenSys ’, pages –, New York,
NY, USA, . ACM. (Cited on pages  and .)
[DMW] Saumya Debray, Robert Muth, and Matthew Weippert. Alias
analysis of executable code. In Proceedings of the th ACM
SIGPLAN-SIGACT symposium on Principles of programming lan-
guages, POPL ’, pages –, New York, NY, USA, . ACM.
(Cited on page .)
 bibliography
[DSVPC+] Bjorn De Sutter, Ludo Van Put, Dominique Chanet, Bruno De Bus,
and Koen De Bosschere. Link-time compaction and optimization of
arm executables. ACM Trans. Embed. Comput. Syst., , February
. (Cited on page .)
[eet] ARM sales, profits rise as partners gain market share.
http://www.eetimes.com/electronic-news/4235514/
ARM-financial-results, . (Cited on page .)
[EG] Andreas Ermedahl and Jan Gustafsson. Deriving annotations for
tight calculation of execution time. In Proceedings of the Third Inter-
national Euro-Par Conference on Parallel Processing, Euro-Par ’,
pages –, London, UK, UK, . Springer-Verlag. (Cited
on page .)
[ES] N. Een and N. Sörensson. MiniSat v. - A SAT Solver with
Conflict-Clause Minimization, System description for the SAT com-
petition, . (Cited on page .)
[Fab] R. S. Fabry. How to design a system in which modules can be
changed on the fly. In Proceedings of the nd international confer-
ence on Software engineering, ICSE ’, pages –, Los Alami-
tos, CA, USA, . IEEE Computer Society Press. (Cited on
page .)
[FE] M. Fernandez and R. Espasa. Speculative alias analysis for exe-
cutable code. In Parallel Architectures and Compilation Techniques,
. Proceedings.  International Conference on, pages  –
, . (Cited on page .)
[FL] Charles N. Fischer and Richard J. LeBlanc, Jr. Crafting a compiler
with C. Benjamin-Cummings Publishing Co., Inc., Redwood City,
CA, USA, . (Cited on pages  and .)
[FMPS] Andrea Flexeder, Bogdan Mihaila, Michael Petter, and Helmut
Seidl. Interprocedural control flow reconstruction. In Proceedings
of the th Asian conference on Programming languages and sys-
tems, APLAS’, pages –, Berlin, Heidelberg, . Springer-
Verlag. (Cited on pages  and .)
bibliography 
[GD] Vijay Ganesh and David L. Dill. A decision procedure for bit-vectors
and arrays. In Computer Aided Verification (CAV ’), Berlin, Ger-
many, July . Springer-Verlag. (Cited on page .)
[GEL] Jan Gustafsson, Andreas Ermedahl, and Björn Lisper. Towards a
flow analysis for embedded system c programs. In Proceedings of the
th IEEE International Workshop on Object-Oriented Real-Time
Dependable Systems, WORDS ’, pages –, Washington, DC,
USA, . IEEE Computer Society. (Cited on page .)
[GH] Rakesh Ghiya and Laurie Hendren. Connection analysis: A practical
interprocedural heap analysis for c. International Journal of Parallel
Programming, , . (Cited on page .)
[GNU] GNU. Gnu binutils. http://www.gnu.org/software/binutils/,
September . (Cited on page .)
[GPF] Andreas Gal, Christian W. Probst, and Michael Franz. Hotpathvm:
an effective jit compiler for resource-constrained devices. In Pro-
ceedings of the nd international conference on Virtual execution
environments, VEE ’, pages –, New York, NY, USA, .
ACM. (Cited on page .)
[HCYC] Wei Chung Hsu, Howard Chen, Pen Chung Yew, and Dong-Yuan
Chen. On the predictability of program behavior using different
input data sets. In Proceedings of the Sixth Annual Workshop on
Interaction between Compilers and Computer Architectures, INTER-
ACT ’, pages –, Washington, DC, USA, . IEEE Computer
Society. (Cited on page .)
[HF] Johannes Helander and Alessandro Forin. Mmlite: a highly com-
ponentized system architecture. In Proceedings of the th ACM
SIGOPS European workshop on Support for composing distributed
applications, EW , pages –, New York, NY, USA, . ACM.
(Cited on page .)
[HKS+] Chih-Chieh Han, Ram Kumar, Roy Shea, Eddie Kohler, and Mani
Srivastava. A dynamic operating system for sensor nodes. In Pro-
ceedings of the rd international conference on Mobile systems, ap-
 bibliography
plications, and services, MobiSys ’, pages –, New York, NY,
USA, . ACM. (Cited on pages  and .)
[HN] Michael Hicks and Scott Nettles. Dynamic software updating.
ACM Trans. Program. Lang. Syst., ():–, November .
(Cited on page .)
[Hor] Susan Horwitz. Precise flow-insensitive may-alias analysis is np-
hard. ACM Trans. Program. Lang. Syst., ():–, January .
(Cited on page .)
[IDA] IDAPro disassembler. http://www.hex-rays.com/idapro/. (Cited
on pages  and .)
[Ihl] Torsten Ihle. Static profiling, . (Cited on pages  and .)
[IKY+] Kazuaki Ishizaki, Motohiro Kawahito, Toshiaki Yasue, Mikio
Takeuchi, Takeshi Ogasawara, Toshio Suganuma, Tamiya Onodera,
Hideaki Komatsu, and Toshio Nakatani. Design, implementation,
and evaluation of optimizations in a just-in-time compiler. In
Proceedings of the ACM  conference on Java Grande, JAVA
’, pages –, New York, NY, USA, . ACM. (Cited on
page .)
[ipva] RFC  - Internet Protocol Version . http://www.ietf.org/
rfc/rfc2460.txt. (Cited on page .)
[ipvb] RFC  - Internet Protocol Version . http://www.ietf.org/rfc/
rfc791.txt. (Cited on page .)
[Jah] J. Jahn. Vector Optimization: Theory, Applications, and Extensions.
Springer, . (Cited on page .)
[Jan] N Janssens. Dynamic Software Reconfiguration in Programmable
Networks. PhD thesis, Katholieke Universiteit Leuven, . (Cited
on pages xiv and .)
[Jer] Ahmed A. Jerraya. Long term trends for embedded system design.
In Proceedings of the Digital System Design, EUROMICRO Sys-
bibliography 
tems, DSD ’, pages –, Washington, DC, USA, . IEEE
Computer Society. (Cited on page .)
[JMMV] Nico Janssens, Sam Michiels, Tom Mahieu, and Pierre Verbaeten.
Towards hot-swappable system software: The dips/cups component
framework. In In Proceedings - The Seventh International Workshop
on Component Oriented Programming, . (Cited on page .)
[KH] Ralph Keller and Urs Hölzle. Binary component adaptation. In
Proceedings of the th European Conference on Object-Oriented
Programming, pages –, London, UK, . Springer-Verlag.
(Cited on page .)
[Kil] Gary A. Kildall. A unified approach to global program optimization.
In Proceedings of the st annual ACM SIGACT-SIGPLAN sympo-
sium on Principles of programming languages, POPL ’, pages –
, New York, NY, USA, . ACM. (Cited on page .)
[Kin] Tim Kindberg. Reconfiguring client-server systems. Technical re-
port, Proc. International Workshop on Configurable Distributed Sys-
tems (IWCDS’, . (Cited on page .)
[kLCM+] Chi keung Luk, Robert Cohn, Robert Muth, Harish Patil, Ar-
tur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa, and
Reddi Kim Hazelwood. Pin: Building customized program anal-
ysis tools with dynamic instrumentation. In In Programming Lan-
guage Design and Implementation, pages –. ACM Press, .
(Cited on pages  and .)
[kLMP+] Chi keung Luk, Robert Muth, Harish Patil, Robert Cohn, and Geoff
Lowney. Ispike: A post-link optimizer for the intel itanium architec-
ture. In In IEEE/ACM International Symposium on Code Genera-
tion and Optimization, pages –, . (Cited on page .)
[KM] Jeff Kramer and Jeff Magee. The evolving philosophers problem: Dy-
namic change management. IEEE Trans. Softw. Eng., ():–
, November . (Cited on page .)
 bibliography
[Kno] Jens Knoop. Optimal interprocedural program optimization: a new
framework and its application. Springer-Verlag, Berlin, Heidelberg,
. (Cited on page .)
[Knu] Donald E. Knuth. Computer Programming as an Art. ():–
, December . (Cited on page v.)
[KP] Joel Koshy and Raju Pandey. Vmstar: synthesizing scalable runtime
environments for sensor networks. In Proceedings of the rd inter-
national conference on Embedded networked sensor systems, SenSys
’, pages –, New York, NY, USA, . ACM. (Cited on
page .)
[KRL+] Fabio Kon, Manuel Román, Ping Liu, Jina Mao, Tomonori Ya-
mane, Claudio Magalhã, and Roy H. Campbell. Monitoring, security,
and dynamic configuration with the dynamictao reflective orb. In
IFIP/ACM International Conference on Distributed systems plat-
forms, Middleware ’, pages –, Secaucus, NJ, USA, .
Springer-Verlag New York, Inc. (Cited on page .)
[KRV] C. Kruegel, W. Robertson, and G. Vigna. Detecting kernel-level
rootkits through binary analysis. In Computer Security Applications
Conference, . th Annual, pages  – , dec. . (Cited
on page .)
[KU] Marc A. Kaplan and Jeffrey D. Ullman. A general scheme for the
automatic inference of variable types. In Proceedings of the th
ACM SIGACT-SIGPLAN symposium on Principles of programming
languages, POPL ’, pages –, New York, NY, USA, . ACM.
(Cited on page .)
[KU] Marc A. Kaplan and Jeffrey D. Ullman. A scheme for the automatic
inference of variable types. J. ACM, ():–, January .
(Cited on page .)
[KV] Johannes Kinder and Helmut Veith. Jakstab: A static analysis plat-
form for binaries. In Proceedings of the th international conference
on Computer Aided Verification, CAV ’, pages –, Berlin,
Heidelberg, . Springer-Verlag. (Cited on page .)
bibliography 
[KV] J. Kinder and H. Veith. Precise static analysis of untrusted driver
binaries. In Formal Methods in Computer-Aided Design (FMCAD),
, pages  –, oct. . (Cited on page .)
[KZV] Johannes Kinder, Florian Zuleger, and Helmut Veith. An abstract
interpretation-based framework for control flow reconstruction from
binaries. In Proceedings of the th International Conference on
Verification, Model Checking, and Abstract Interpretation, VMCAI
’, pages –, Berlin, Heidelberg, . Springer-Verlag. (Cited
on page .)
[LC] Philip Levis and David Culler. Mate: a tiny virtual machine for
sensor networks. SIGOPS Oper. Syst. Rev., ():–, October
. (Cited on page .)
[Lev] John R. Levine. Linkers and Loaders. Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA, st edition, . (Cited on page .)
[Lev] TinyOS: An operating system for sensor networks. pages –.
Springer, . (Cited on page .)
[LL] C.L. Liu and James Layland. Scheduling algorithms for multi-
programming in a hard-real-time environment, . (Cited on
page .)
[LL] Peter Lee and Mark Leone. Optimizing ml with run-time code
generation. SIGPLAN Not., ():–, May . (Cited on
page .)
[LM] Yau-Tsun Steven Li and Sharad Malik. Performance analysis of em-
bedded software using implicit path enumeration. SIGPLAN Not.,
():–, November . (Cited on page .)
[LR] Horst Lichter and Gerhard Riedinger. Improving software quality by
static program analysis. Software Process: Improvement and Prac-
tice, ():–, . (Cited on page .)
[MG] Kaveh Moazami-Goudarzi. Consistency preserving dynamic recon-
figuration of distributed systems. PhD thesis, Imperial College, Lon-
 bibliography
don, . (Cited on page .)
[MHJV] Sam Michiels, Wouter Horré, Wouter Joosen, and Pierre Verbaeten.
Davim: a dynamically adaptable virtual machine for sensor networks.
In Proceedings of the international workshop on Middleware for sen-
sor networks, MidSens ’, pages –, New York, NY, USA, .
ACM. (Cited on page .)
[MP] Nestor Michelena and Panos Papalambros. A hypergraph framework
for optimal model-based decomposition of design problems. Com-
putational Optimization and Applications, :–, . (Cited
on page .)
[MR] E. Morel and C. Renvoise. Global optimization by suppression of
partial redundancies. Commun. ACM, ():–, February .
(Cited on page .)
[mWHMC+] Wen mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P.
Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouel-
lette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G.
Holm, and Daniel M. Lavery. The superblock: An effective tech-
nique for vliw and superscalar compilation. THE JOURNAL OF
SUPERCOMPUTING, :–, . (Cited on page .)
[PCB+] L. Van Put, D. Chanet, B. De Bus, B. De Sutter, and K. De Boss-
chere. Diablo: a reliable, retargetable and extensible link-time rewrit-
ing framework. International Symposium on Signal Processing and
Information Technology, :–, . (Cited on page .)
[Pro] The GCC Home Page GNU Project. Free software foundation. http:
//gcc.gnu.org, September . (Cited on page .)
[RD] Baer Roland and Fischer Daniel. Code-coverage auf kleinen targets.
. (Cited on page .)
[Rhi] Morten Rhiger. Run-time code generation for type-directed partial
evaluation, . (Cited on page .)
bibliography 
[SBB+] B. De Sutter, B. De Bus, K. De Bosschere, P. Keyngnaert, and
B. Demoen. On the static analysis of indirect control transfers in
binaries. In In PDPTA, pages –, . (Cited on page .)
[SDAL] Benjamin Schwarz, Saumya Debray, Gregory Andrews, and
Matthew Legendre. Plto: A link-time optimizer for the intel ia-
 architecture. In In Proc.  Workshop on Binary Translation
(WBT-, . (Cited on page .)
[SDJ] Bogong Su, Shiyuan Ding, and Lan Jin. An improvement of trace
scheduling for global microcode compaction. SIGMICRO Newsl.,
():–, December . (Cited on page .)
[SM] S. M. Sadjadi and P. K. Mckinley. Act: An adaptive corba template
to support unanticipated adaptation. Technical report, . (Cited
on page .)
[SP] Micha Sharir and Amir Pnueli. Two approaches to interprocedural
data flow analysis, chapter , pages –. Prentice-Hall, Engle-
wood Cliffs, NJ, . (Cited on page .)
[SYK+] Toshio Suganuma, Toshiaki Yasue, Motohiro Kawahito, Hideaki Ko-
matsu, and Toshio Nakatani. A dynamic optimization framework
for a java just-in-time compiler. SIGPLAN Not., ():–,
October . (Cited on page .)
[tcp] RFC  - Transmission Control Protocol. http://www.ietf.org/
rfc/rfc793.txt. (Cited on page .)
[The] Henrik Theiling. Extracting safe and precise control flow from
binaries. In IN PROC. TH CONFERENCE ON REAL-TIME
COMPUTING SYSTEMS AND APPLICATIONS, . (Cited on
pages  and .)
[The] H Theiling. Control Flow Graphs for Real-Time System Analysis.
PhD thesis, Universitaet des Saarlandes, . (Cited on pages ,
, , and .)
 bibliography
[tls] RFC  - Transport Layer Security Version .. http://www.ietf.
org/rfc/rfc4346.txt. (Cited on page .)
[tr] ARM Holdings eager for PC and server expansion. http:
//www.theregister.co.uk/2011/02/01/arm_holdings_q4_2010_
numbers/, . (Cited on page .)
[TVJ+] Eddy Truyen, Bart Vanhaute, Wouter Joosen, Pierre Verbaeten, and
Bo Norregaard Jorgensen. Dynamic and selective combination of
extensions in component-based applications. In In Proceedings of the
rd International Conference on Software Engeneering (ICSE’,
. (Cited on page .)
[udp] RFC  - User Datagram Protocol. http://www.ietf.org/rfc/
rfc768.txt. (Cited on page .)
[Van] Jean-Jacques Vandewalle. Smart card research perspectives. In Pro-
ceedings of the  international conference on Construction and
Analysis of Safe, Secure, and Interoperable Smart Devices, CAS-
SIS’, pages –, Berlin, Heidelberg, . Springer-Verlag.
(Cited on page .)
[VG] Ramakrishnan Venkitaraman and Gopal Gupta. Static program
analysis of embedded executable assembly code. In Proceedings of
the  international conference on Compilers, architecture, and
synthesis for embedded systems, CASES ’, pages –, New
York, NY, USA, . ACM. (Cited on page .)
[Wag] David A. Wagner. Static analysis and computer security: New tech-
niques for software assurance. Technical report, . (Cited on
page .)
[WD] David Wagner and Drew Dean. Intrusion detection via static anal-
ysis. In Proceedings of the  IEEE Symposium on Security and
Privacy, SP ’, pages –, Washington, DC, USA, . IEEE
Computer Society. (Cited on page .)
[WS] Ian Welch and Robert J. Stroud. Kava - a reflective java based on
bytecode rewriting. In Proceedings of the st OOPSLA Workshop on
bibliography 
Reflection and Software Engineering: Reflection and Software Engi-
neering, Papers from OORaSE , pages –, London, UK,
UK, . Springer-Verlag. (Cited on page .)
[XLC] Qiang Xie, Jinfeng Liu, and Pai H. Chou. Tapper: a lightweight
scripting engine for highly constrained wireless sensor nodes. In
Proceedings of the th international conference on Information pro-
cessing in sensor networks, IPSN ’, pages –, New York, NY,
USA, . ACM. (Cited on page .)
[XMZX] Nai Xia, Bing Mao, Qingkai Zeng, and Li Xie. Efficient and practical
control flow monitoring for program security. In Proceedings of the
th Asian computing science conference on Advances in computer
science: secure software and related issues, ASIAN’, pages –,
Berlin, Heidelberg, . Springer-Verlag. (Cited on page .)
[XSS] L. Xu, F. Sun, and Z. Su. Constructing Precise Control Flow Graphs
from Binaries. University of California, Davis, Tech. Rep, .
(Cited on page .)
[YMP+] Byung-Sun Yang, Soo-Mook Moon, Seongbae Park, Junpyo Lee,
SeungIl Lee, Jinpyo Park, Yoo C. Chung, Suhyun Kim, Kemal
Ebcioglu, and Erik Altman. Latte: A java vm just-in-time com-
piler with fast and efficient register allocation. In Proceedings of the
 International Conference on Parallel Architectures and Com-
pilation Techniques, PACT ’, pages –, Washington, DC, USA,
. IEEE Computer Society. (Cited on page .)

DECLARATION
I hereby declare that this thesis is my own work and effort and that it has not been
submitted anywhere for any award. Where other sources of information have been
used, they have been acknowledged.
Paderborn, February 
Daniel Baldin
