Digital Control of Micro-Systems Using On-Line Arithmetic by Dimmler, M.
Digital Control of Micro-Systems
using On-Line Arithmetic
THE`SE No 2050(1999)
PRE´SENTE´E AU DE´PARTEMENT DE GE´NIE ME´CANIQUE
E´COLE POLYTECHNIQUE FE´DE´RALE DE LAUSANNE
POUR L’OBTENTION DU GRADE DE
DOCTEUR E`S SCIENCES TECHNIQUES
PAR
MARTIN DIMMLER
Inge´nieur en me´canique diploˆme´ Universita¨t Karlsruhe
originaire de Karlsruhe (Allemagne)
accepte´e sur proposition du jury:
Prof. R. Longchamp, examiner
Prof. H. Bleuler, co-examiner
Prof. J.M. Muller, co-examiner
Prof. U. Holmberg, co-examiner
Dr. J. Moerschell, co-examiner
Lausanne, EPFL
1999

Acknowledgements
I am very grateful to my supervisor, Prof. Roland Longchamp, and
to Prof. Dominique Bonvin for having recruted me to their group. The
friendship that they grant to their assistants is really invaluable. I spent
with them several very fruitful years and for this I thank them.
I also want to thank Prof. Hannes Bleuler and Prof. Jean-Michel
Muller for their consideration and helpful comments in completing this
work, and for acting as my co-examiners, respectively. Prof. Ulf Holm-
berg is similarly acknowledged for acting as a co-examiner and for his
continuing interest and encouragement throughout this work. I also
appreciate his large range of interests and all I learnt by working with
him.
I would also like to thank Dr. Arnaud Tisserand for advice, interac-
tion, and his attention to details during preparation of this thesis. Our
collaboration has been extremely profitable for me. Last but not least,
I thank Dr. Joseph Moerschel for sharing his competence in industrial
electronics, a domain I find extremely stimulating, and for his constant
attention and advice.
I am indebted to all of the members of the Laboratoire d’Automatique
for contributing to the pleasant working atmosphere. Financial and
technical support for this project by the Centre Suisse d’E´lectronique
and de Microte´chnique is gratefully acknowledged.
Completing a Ph.D. thesis is a task that extends over several years.
Over such a long period, the support of relatives and friends becomes
more and more important. There are far too many people to thank
them all individually here but they can be assured of my most sincere
gratitude. Yet, I cannot avoid mentioning my wife Susanne who con-
stantly encouraged me and patiently accepted additional working hours
throughout this period. Finally, of course, all of my deepest affection
goes to my parents, for having taught me those things which most mat-
i
ii Acknowledgements
ter in life.
Abstract
The integration of control micro-electronics within mechanical mini and
micro-systems is a current trend in the design of high-performance
mechatronic systems. However, implementing controllers of higher com-
plexity, while still decreasing the size of the system implies difficult
demands on the control electronics. In order to maintain a high compu-
tational speed and to reduce controller size, implementation complexity
and power consumption, often custom electronics become necessary.
Actually, there are two trends towards a progressive miniaturization.
One is a pure technological optimization (shrinking of transistor and
interconnection dimensions) which is based on existing algorithms. The
other consists of efforts to change the signal processing structure. In
this thesis, the latter approach is followed and it is demonstrated that
serial computations with most significant digits first (MSDF), that is on-
line arithmetic, offer an important potential for real-time control. They
allow combination of traditional functions, such as analog to digital con-
verters and control data computations. This introduces a parallelism
between sequential operations by overlapping these in a digit-pipelined
fashion. Additionally, a parallelism at the operator level becomes pos-
sible because of the small size and low interconnection bandwidth of
on-line arithmetic operations. This makes controller construction very
modular and leads to very efficient controller implementations with
small size, high speed and low power consumption.
In this thesis the use of on-line arithmetic for real-time control is
presented in comparison with classical methods like digit-parallel ap-
proaches or least significant digit first (LSDF) arithmetic. Theoretical
aspects of on-line arithmetic have already been known for about 20 years
in the computer science literature, but they were never applied to real-
time control. Therefore, most control engineers are not familiar with
this method and a short introduction to the basic concepts of on-line
iii
iv Abstract
arithmetic is given.
During study of the on-line arithmetic literature it appeared that
no unified framework for the interconnection of on-line operators to
complex algorithms existed. In order to simplify the on-line arithmetic
design and to make it accessible for control engineers, two implementa-
tion concepts will be presented. The first one extends the mathematical
on-line operators in a way that unifies the interfaces between different
operators. This leads to Modular On-Line Operators which can be di-
rectly combined to control algorithms. This method is simple and can
be employed easily by a non-specialist in the field of computer arith-
metic. However, for some applications, the restrictions on the scale
of intermediate results lead to an augmentation of the operand length
and thereby to higher computation time and circuit size. For imple-
mentations requiring higher performances, a second method was added
which demands slightly more insight in the field of on-line arithmetic
but leads to faster and smaller solutions. For both methods real-time
control specific questions are discussed.
For digit-serial computations the choice of the radix has an impor-
tant influence on the controller speed (smaller operand length, i.e. less
clock cycles for a higher radix), but also on controller size. Therefore,
the influence of the radix is discussed and the choice of radix 2 for
real-time control implementations is proposed.
In the last part of this thesis, a detailed comparison to digit-parallel
is presented and finally the method is applied to two controllers for
mechatronic systems, i.e. a numerical PID controller for a current loop
and a two-degrees-of-freedom controller for a piezo-electric fine-pointing
mechanism.
Zusammenfassung
Die Integration von Mikroelektronik in mechanische Systeme ermo¨glicht
die Entwicklung kompakter Pra¨zisionsmechanismen. Der Trend zur Mi-
niaturisierung und zu immer ho¨heren dynamischen Anforderungen stel-
len den Systemdesigner jedoch vor eine schwere Aufgabe. Gegensa¨tzli-
che Gu¨tekriterien wie hohe Rechengeschwindigkeiten, Gro¨ssenbeschra¨n-
kungen, einfache Implementierung und kleine Stromaufnahme mu¨ssen
oft gleichzeitig erfu¨llt werden. Dies erfordert in vielen Fa¨llen die Ent-
wicklung anwendungsspezifischer Hardware. Generell sind zwei Lo¨sungs-
ansa¨tze zu beobachten um der fortschreitenden Miniaturisierung stand-
zuhalten. Zum einen eine rein technologische Optimierung (Schrumpfen
der Transistor- und Leitungsabmessungen) basierend auf bestehenden
Algorithmen und zum andern Anstrengungen die Struktur der Signal-
verarbeitung zu a¨ndern. In dieser Doktorarbeit betrachten wir den letz-
teren Fall und wollen dabei zeigen, dass serielle Arithmetik mit meist-
signifikanter Stelle (MSD) zuerst, genannt On-Line Arithmetik, ein in-
teressantes Potential fu¨r Echtzeitregelungen bietet. Durch den seriellen
Charakter ist eine U¨berlappung von A/D Konverter und Arithmetik
sowie einzelner Operationen untereinander mo¨glich. Diese Parallelver-
arbeitung innerhalb eines Datenpfades kann durch die kleine Gro¨sse und
die wenigen seriellen Verbindungen der Einzeloperatoren untereinander
auch noch auf mehrere parallele Datenpfade ausgedehnt werden. Dies
erleichtert den Reglerentwurf und fu¨hrt zu kleinen schnellen Reglerrea-
lisierungen mit niedriger Stromaufnahme.
In der vorliegenden Arbeit wird die Verwendung von On-Line Arith-
metik fu¨r Echtzeitregelungen im Vergleich zu klassischen Methoden wie
Parallelarithmetik oder Standardseriearithmetik motiviert. Theoreti-
sche Erkenntnisse u¨ber On-Line Arithmetik sind schon seit etwa 20
Jahren in der mathematischen Literatur zu finden, allerdings wurden
sie nie fu¨r Echtzeitregelungen verwendet. Deshalb ist den meisten Re-
v
vi Zusammenfassung
gelungstechnikern diese Methode nicht gela¨ufig. Wir werden aus diesem
Grund zu Beginn eine kurze Einfu¨hrung in die Grundbegriffe der On-
Line Arithmetik angeben.
Die Studie der bestehenden Literatur ergab, dass bis zum heuti-
gen Zeitpunkt keine einheitlichen Implementierungsrichtlinien zur Ver-
bindung mehrerer On-Line Operatoren zu komplexen Algorithmen exi-
stieren. Deshalb haben wir uns zum Ziel gesetzt, das Design mit On-
Line Arithmetik zu vereinfachen und den Automatisierungstechnikern
zuga¨nglich zu machen indem wir zwei Implementisierungskonzepte ein-
fu¨hren. Das erste Konzept erweitert die mathematischen On-Line Ope-
ratoren mit dem Ziel die Schnittstellen zwischen den Einzeloperatoren
zu vereinheitlichen. Dies fu¨hrt zu modularen On-Line Operatoren, die
direkt zu Regelalgorithmen verbunden werden ko¨nnen. Diese Methode
ist einfach anzuwenden und ist somit selbst fu¨r Anfa¨nger auf dem Ge-
biet der Computerarithmetik brauchbar. Die erreichbaren Rechenzeiten
und Chipgro¨ssen sind allerdings nur suboptimal. Um auch fu¨r Anwen-
dungen mit noch gro¨sseren Anspru¨chen Lo¨sungen bereitzustellen, geben
wir ein zweites Designkonzept an. Es setzt zwar etwas tiefere Einsicht
in On-Line Arithmetik voraus, erzeugt dafu¨r aber schnellere und klei-
nere Lo¨sungen. Fu¨r beide Designkonzepte werden regelungstechnische
Fragestellungen diskutiert.
Fu¨r serielle Reglerrealisierungen spielt die Wahl der Zahlenbasis eine
grosse Rolle, weil sie zum einen die Rechenzeit (ku¨rzere Zahlendarstel-
lung fu¨r ho¨here Basen) und zum anderen die Operatorgro¨sse beeinflusst.
Deshalb wird auf den Einfluss der Basis na¨her eingegangen und die Wahl
der Basis 2 fu¨r die Realisierung von Echtzeitreglern anhand von Verglei-
chen mit Imlementierungsbeispielen in Basis 4 motiviert.
Im letzten Teil dieser Doktorarbeit folgt ein detaillierter Vergleich zu
Parallelarithmetik im Hinblick auf Reglergro¨sse, Rechengeschwindigkeit
und Stromaufnahme. Anschliessend wird die Methode auf zwei Reg-
lerrealisierungen angewendet; zum einen auf einen numerischen PID-
Stromregler und zum anderen auf einen Regler mit zwei Freiheitsgraden
fu¨r einen Piezo-Pra¨zisionsmechanismus.
Re´sume´
L’inte´gration de la micro-e´lectronique dans les syste`mes me´caniques
permet le de´veloppement de me´canismes tre`s compacts. La tendance,
de re´duire les dimensions et d’ame´liorer les proprie´te´s dynamiques met
l’automaticien dans une situation difficile. Des crite`res contradictoires
comme une grande vitesse de calcul, une restriction de la taille du
syste`me, une re´alisation simple et une consommation d’e´nergie mini-
male doivent eˆtre satisfaits simultane´ment. Souvent, cela demande un
de´veloppement materiel spe´cifique. Actuellement, il y a deux tendances
pour rester compatible avec la re´duction progressive des dimensions.
On observe, d’une part, une optimisation technologique (re´duction des
dimensions de transistors et connexions) base´e sur des algorithmes exis-
tants et, d’autre part, des efforts visant a` modifier la structure du traite-
ment du signal. Dans cette the`se, nous adoptons la deuxie`me approche
et montrons qu’une arithme´tique en se´rie avec les poids forts (MSD) en
teˆte, appele´e arithme´tique en ligne, offre un potentiel inte´ressant pour la
commande de syste`mes en temps re´el. Le traitement en se´rie permet un
chevauchement de la conversion A/D et de l’arithme´tique ainsi qu’un
chevauchement d’ope´rations conse´cutives. Ce paralle´lisme sur un che-
min de donne´es peut encore eˆtre e´tendu a` plusieurs chemins graˆce a` la
petite taille des ope´rateurs en ligne et des connexions peu nombreuses
entre eux. Ce caˆblage des ope´rateurs simplifie e´norme´ment la re´alisation
d’un re´gulateur et permet une implantation de petite taille, offrant une
grande vitesse de calcul et une faible consommation.
Dans ce travail, l’arithme´tique en ligne est motive´e en la comparant
avec des me´thodes classiques comme des approches utilisant l’arith-
me´tique digit-paralle`le ou arithme´tique en se´rie standard (LSDF). Des
re´sultats the´oriques concernant l’arithme´tique en ligne ont e´te´ publie´s
a` plusieurs occasions dans la litte´rature mathe´matique pendant les 20
dernie`res anne´es, mais ils n’ont jamais e´te´ exploite´s pour la commande
vii
viii Re´sume´
en temps re´el, raison pour laquelle tre`s peu d’automaticiens connaissent
cette me´thode. Nous pre´sentons au de´but, donc, une introduction a`
l’arithme´tique en ligne.
A travers l’e´tude de la litte´rature existante, on a constate´ qu’il
manque un concept unifie´ pour la connexion des ope´rateurs d’arithme´-
tique en-ligne pour re´aliser des algorithmes complexes. C’est pour cette
raison que nous allons simplifier l’arithme´tique en-ligne et la rendre
accessible aux automaticiens en introduisant deux concepts d’implanta-
tion. Le premier concept e´tend les ope´rateurs mathe´matiques avec le but
d’unifier les interfaces entre les diffe´rents ope´rateurs. Cela conduit a des
ope´rateurs en-ligne modulaires qui peuvent directement eˆtre connecte´s
pour cre´er des algorithmes de re´glage. Cette me´thode est simple et peut
eˆtre applique´e meˆme par des de´butants dans le domaine de l’arithme´-
tique d’ordinateur. Les temps de calcul et les tailles de circuits obtenus
ne sont toutefois que sous-optimaux. Pour pouvoir aussi re´aliser des
applications avec des spe´cifications plus se´ve´res, nous introduisons une
deuxie`me me´thode de conception. Elle demande un peu plus de connais-
sance dans le domaine de l’arithme´tique, mais ge´ne`re des solutions plus
rapides et plus petites. Pour les deux me´thodes de conception, des ques-
tions en rapport avec la commande automatique sont discute´es.
Pour les calculs en se´rie, le choix de la base joue un roˆle impor-
tant, parce qu’elle influence le temps de calcul (une base e´leve´e a une
repre´sentation plus courte) et la taille des ope´rateurs. Dans ce travail,
l’influence de la base est examine´ et le choix de la base 2 pour la com-
mande en temps re´elle est motive´ en la comparant avec des exemples
de re´alisation en base 4.
Dans la dernie`re partie de la the`se, une comparaison de´taille´e de
l’arithme´tique en ligne et de l’arithme´tique paralle`le est pre´sente´e concer-
nant la taille, la vitesse et la consommation d’e´nergie. La me´thode est
ensuite applique´e a` deux re´gulateurs diffe`rents : un re´gulateur PID
nume´rique pour une commande de courant et un re´gulateur a` deux
degre´es de liberte´ pour un me´canisme de pre´cision base´ sur des action-
neurs pie`zo e´lectriques.
Contents
1 Introduction 1
1.1 Motivation for using On-Line Arithmetic for Real-Time
Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Scope and Contributions of the Thesis . . . . . . . . . . 6
1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . 7
2 On-Line Arithmetic: A Short Overview 11
2.1 Redundant Number Systems . . . . . . . . . . . . . . . 13
2.2 On-Line Arithmetic Operators . . . . . . . . . . . . . . 16
2.2.1 On-Line Adder . . . . . . . . . . . . . . . . . . . 16
2.2.2 On-Line Multi-Adders . . . . . . . . . . . . . . . 17
2.2.3 On-Line Multiplication . . . . . . . . . . . . . . . 19
2.2.4 On-Line Division and Square Root . . . . . . . . 21
2.2.5 Evaluation of Polynomials . . . . . . . . . . . . . 22
2.3 Speed and Size of
Redundant Arithmetic . . . . . . . . . . . . . . . . . . . 25
2.4 Conversions between
Standard and Redundant Numbers . . . . . . . . . . . . 25
3 Design Concepts for On-Line Arithmetic Controllers 27
3.1 Controller Constructions based on
Modular On-Line Arithmetic Operators . . . . . . . . . 28
3.1.1 Initialization of On-Line Arithmetic Operators . 30
3.1.2 Normalization . . . . . . . . . . . . . . . . . . . . 32
3.1.3 Synchronization . . . . . . . . . . . . . . . . . . 34
3.2 Controller Construction based on
Global Execution Control . . . . . . . . . . . . . . . . . 36
ix
x Contents
3.2.1 Extended Initialization . . . . . . . . . . . . . . . 37
3.2.2 Extended Normalization . . . . . . . . . . . . . . 39
3.2.3 Extended Synchronization . . . . . . . . . . . . . 40
3.3 Design Example . . . . . . . . . . . . . . . . . . . . . . 40
4 Implementation Guidelines 45
4.1 Simplifications with Multi-Operations . . . . . . . . . . 45
4.2 Appropriate Controller Representation . . . . . . . . . . 47
4.3 Reuse of Operators
in the Same Algorithm . . . . . . . . . . . . . . . . . . . 52
4.4 Hardware and Software Support . . . . . . . . . . . . . 53
4.5 On-line Arithmetic Library . . . . . . . . . . . . . . . . 56
5 The Choice of the Radix 61
5.1 Influence on Computation Time . . . . . . . . . . . . . . 61
5.2 Implementation of On-Line Arithmetic Radix 4 Adders 63
5.2.1 Number and Bit-level Encoding . . . . . . . . . . 63
5.2.2 Functional Description of Radix 4 Adders . . . . 64
5.2.3 Comparison of Radix 4 Adders . . . . . . . . . . 68
5.3 Suitability for Real-Time Control . . . . . . . . . . . . . 69
6 Comparison to
Classical Solutions 71
6.1 Architectures Compared . . . . . . . . . . . . . . . . . . 71
6.1.1 Sequential digit-parallel calculation scheme . . . 72
6.1.2 Full-parallel digit-parallel calculation scheme . . 75
6.2 Sampling Time Requirements of
Microsystems . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3 Speed, Size, and Power Consumption . . . . . . . . . . . 78
6.3.1 Speed . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.2 Circuit Size . . . . . . . . . . . . . . . . . . . . . 79
6.3.3 Power Consumption . . . . . . . . . . . . . . . . 81
7 Applications 85
7.1 PID-Demonstrator . . . . . . . . . . . . . . . . . . . . . 86
7.1.1 Controller Representation . . . . . . . . . . . . . 86
7.1.2 On-Line Arithmetic Computation Scheme . . . . 87
7.1.3 Hardware Implementation . . . . . . . . . . . . . 88
7.1.4 Controller Performance . . . . . . . . . . . . . . 89
7.2 Piezo Tip–Tilt Mirror . . . . . . . . . . . . . . . . . . . 91
7.2.1 System and Controller Representation . . . . . . 91
Contents xi
7.2.2 On-Line Arithmetic Computation Scheme . . . . 96
7.2.3 Hardware Implementation . . . . . . . . . . . . . 97
7.2.4 System Performance . . . . . . . . . . . . . . . . 102
8 Conclusions 105
8.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . 105
8.2 Practical Application Perspective . . . . . . . . . . . . . 107
8.3 Further Research . . . . . . . . . . . . . . . . . . . . . . 108
List of Abbreviations 111
List of Symbols 115
Bibliography 119

Chapter 1
Introduction
1.1 Motivation for using On-Line Arithmetic
for Real-Time Control
The design and manufacture of mechanical components and systems
has reached a very high standard. With the low-cost integration of
micro-electronics, this offers new possibilities for compact high-precision
mechanisms. Several applications have already appeared on the market,
for example drives, robots or fine positioning devices. They are mostly
controlled by digital controllers, such as micro-controllers, digital sig-
nal processors (DSPs) or application specific integrated circuits (ASICs)
with generally fixed parameters. The digital controllers are thereby part
of a feedback loop (see Fig. 1.1). They perform algorithms on the refer-
ence and measurement signals in order to improve the system dynamics
and to follow desired reference signals.
The circuits used are mostly based on digit-parallel arithmetic op-
erators which are sequentially scheduled by an instruction set in the
memory (see Fig. 1.2a). However, in most mechatronic systems, these
general purpose solutions are only necessary during controller develop-
ment. Afterwards, at run time, the controller repeats a certain number
of operations cyclically with very few user interactions. The whole con-
trol algorithm could be realized in the form of a complex operator in
special hardware. This avoids communication delays between mem-
ory and the arithmetic and logic unit (ALU) and offers, especially for
multiple input multiple output (MIMO) systems, a potential for effi-
1
2 Chapter 1. Introduction
B 
D
C
A
D/ A
Analog
FilterA/

D
Physical
Systemq-1
Digital C ontroller
Refer ence ek
zk+ 1 zk
uk
Measur ements
Mechatronic System
rk
sk
−
+
	
Figure 1.1: Digital controller in the feedback path of a mechatronic
system
cient parallel computation of independent terms and therefore a further
speed improvement (see Fig. 1.2b–d). The inherent disadvantage of
using digit-parallel arithmetic for these special operators is the large
number of gates, leading to increased circuit space and power consump-
tion. This becomes a major problem for micro-systems with embedded
controllers since, in addition to high controller speed, small dimensions
and low power consumption are the most important controller require-
ments. In many mobile and aerospace applications for example, battery
lifetime and system dimensions play a major role.
In principle, there are two solutions for facing this challenge of minia-
turization. One is the pure technological approach which is driven by
the enormous progress which has been made in circuit technology and
manufacturing. The influence of increasing complexity on power con-
sumption and system dimensions is thereby kept low by shrinking the
dimensions of electrical components on the chip. This trend will cer-
tainly continue for some time. However, manufacturing cost will become
more important when approaching the physical limits. The second ap-
proach consists of a fundamental change in the signal processing struc-
ture before applying it to a special technology. Here changes are mainly
made in the arithmetic realization of the individual operators. Their
combination influences finally the overall performance. The main goal
for these arithmetic changes is to find an architecture which allows for
1.2. Related Work 3
a given computation time to reduce complexity and power consumption
in comparison to digit-parallel arithmetic.
In order to reduce complexity, digit-serial least-significant-digit-first
(LSDF) arithmetic (Fig. 1.2c) has often been suggested [DS88, HC90,
Kas98]. The potential advantages of the LSDF approach include:
• Simplicity and small size of the basic operators (digit level).
• Serial communication (few I/O pins).
• Potential overlapping of several operations (digit-level pipeline).
However, there are several disadvantages in the LSDF approach. First,
A/D (analog to digital) converters and operations such as division and
square root produce the outputs in most-significant-digit-first (MSDF)
form. Consequently, a sequence involving these operations cannot be
performed without large delays between successive operations to trans-
form these outputs into LSDF form. Second, multiplications in the
LSDF mode produce the least significant half of the result first which
may not be used in subsequent operations because of limited precision.
Especially for control algorithms with many multiplications, computa-
tion time and necessary control logic increase significantly with LSDF
arithmetic (see Fig. 1.2c).
In this thesis, the new concept will be introduced of using a known
MSDF serial arithmetic, called on-line arithmetic, in real-time control
systems in order to avoid the LSDF problems whilst still keeping the
advantageous features of digit-serial computations, such as small gate
number and low number of interconnections. The use of on-line arith-
metic for control algorithms permits an overlap of computation and
A/D conversion (see Fig. 1.2d) as well as with the shift register of the
D/A converter. This property, undiscovered until now, offers an addi-
tional computation time which has not yet been used, neither by parallel
nor by LSDF arithmetic. The results are designs with low gate num-
ber (serial operators), small computation time (potential overlap) and
low power consumption (low clock frequency because of overlap, no bus
access, short connections of subsequent operators).
1.2 Related Work
During the first 10 years since the discovery of on-line arithmetic in 1977
[ET77], mostly theoretical results have been published [OI79, EG80,
4 Chapter 1. Introduction
com: 	        fetching instructions and data via buses and links
A/D, D/A:  analog / digital, digital / analog conversion
op i:	        ith operation (a x1, a x2, a x3, Σ)
td:	        controller dead time
A/  D
op  1
op  2
op  3
op  4
b) Complex Digit Parallel Operator
td
D/  A
A/  D
op  1
op  2
op  3
op  4
c) Standard Serial A rithmetics (LSDF)
td
D/  A
A/  D co m op  1 co m op  2 co m op  3 op  4co m co m
a) Sequential Processing  with Simple Operators
td
D/  A number
of gates
number
of gates
number
of gates
A/  D
op  1
op  2
op  3
op  4
d) On-Line Arithmeti c Operators (MSDF)
td
D/  A number
of gates
co m
co mco m
Figure 1.2: Timing and size aspects of the computation
ax1 + bx2 + cx3 with different operational schemes
1.2. Related Work 5
OE82, EG83, Erc84]. The main goal in this period was to develop
algorithms in the on-line form for different mathematical operations.
The interconnections of operators to complex algorithms and their
realization in hardware were started later on some very special im-
plementation examples, like singular value decomposition (SVD) algo-
rithms [EL87a, EL88b] and recursive digital filters [EL88a, BWE89,
BEW89, Cha91, FE92]. In both cases computational speed was the
main objective. This led to highly optimized but specific computation
structures which are difficult to adapt to other algorithms.
In the same period of time, design procedures for the systematic
development of single on-line arithmetic operators were investigated.
In these studies implementation criteria were also considered [EL88a,
Tu90], but only on the operator level and not concerning their intercon-
nections. In all these former works, the implementation of algorithms
with several different operators required a specialist with detailed knowl-
edge in computer arithmetic.
At the beginning of the 1990’s Ercegovac [Erc91] and Moran [MRM93]
tried to bring the theoretical results of on-line arithmetic closer to prac-
tical use. They recognized already some of the basic interconnection
concepts, but they were used in an incomplete and non-systematic way.
The need for a normalization algorithm in loops for example (more
details are provided in Sect. 3.1.2) was mentioned in [Erc91] and in
[MRM93], but the solutions given do not solve the problem in its gen-
eral form (e.g. no multi-adders considered).
In parallel with the present work three related subjects have been
treated by A. Tisserand from the E´cole Nationale Supe´rieure de Lyon
(ENS Lyon) (now with Centre Suisse d’Ele´ctronique et de Microtech-
nique (CSEM, Neuchaˆtel)).
The first one is an automatic generator of polynomial evaluations
which uses lookup tables for the first operand digits to reduce the on-
line delay. A polynomial evaluation of this type was used for example in
a neural network implementation in order to compute the tanh function
[GT96, GT99].
The second topic is a special Field Programmable Gate Array for
on-line arithmetic [TMP99]. This circuit, called Field Programmable
On-Line Operator (FPOP1), includes a set of serial A/D and D/A con-
verters and a two-dimensional array of on-line arithmetic cells whose
functionalities as well as interconnections are programmable. The exe-
cution control structure is similar to the one presented in this thesis. In
1Patent pending
6 Chapter 1. Introduction
order to accelerate the division and square root operations, the circuit
is realized in radix 4. The principal problem of a higher radix is to
choose an appropriate digit set and its bit-level coding in order to make
the elementary operations (additions, multiplications, normalizations,
inverse of digits) simple and fast. The individual cell structure based
on this coding is one of the key question of this project.
The third parallel project deals with low power consumption circuits.
The goal of this project is to compare on-line arithmetic implementa-
tions to conventional solutions with respect to power consumption and
to give guidelines in which cases an on-line solution is superior and
should be preferred. General statements for this kind of problem are
difficult because of the large number of influencing parameters.
In the last few years, on-line arithmetic algorithms have also been
employed for software applications [DMT97]. The on-line arithmetic
operators have the interesting property that the precision can be dy-
namically adapted by the number of digits shifted through the opera-
tors. The internal operations for the result digit generation are for this
purpose realized by ordinary digit-parallel operators.
1.3 Scope and Contributions of the Thesis
The goal of this thesis is to improve the hardware implementation of
real-time digital controllers in micro-systems. The complexity of the
proposed method should be manageable by an application engineer even
without detailed knowledge in computer arithmetic.
The class of controllers, covered here are the ones with a fixed struc-
ture, mostly in state-space representation or difference equations (see for
instance [Vac95, FPW98]). The controller equations can be represented
in the following form:
zk+1 = f(zk, sk, rk) (State equations) (1.1)
uk = g(zk, sk, rk) (Output equations)
where zk, sk, rk, uk are the controller states, the measurements, the
references and the controller outputs, respectively. The controller states
act here as auxiliary variables. The functions f and g can include all
kinds of nonlinearities like trigonometric coordinate transformations or
polynomial approximations. However, iterative methods with decision
branches like Model Predictive Control (MPC) or discrete event systems
are not investigated.
1.4. Outline of the Thesis 7
As already stated in Sect. 1.1, on-line arithmetic seems to be well
suited for pipelining A/D conversions and digital control algorithms
of fixed structure. However, in the past, on-line arithmetic was both
unknown to, and in an inconvenient form for control engineers, and
computer arithmetic specialists were not aware of the requirements for
control systems. The consequence was that some implementation con-
cepts were left out and that the effort required for an efficient controller
implementation was too important for control engineers. This thesis
aims to close that gap by adding the missing implementation concepts
to the theory and by giving guidelines for a systematic construction of
control algorithms in on-line arithmetic.
The main contributions of this thesis can be summarized as follows:
• The overlap of A/D conversion and computation is proposed with
the goal of accelerating the computation.
• Two design concepts for the systematic construction of on-line
arithmetic algorithms are introduced. The Modular On-Line Arith-
metic Operator scheme has not yet been published. The Global
Execution Control scheme has already been implicitly used several
times, but not clearly analyzed and described (see e.g. [BEW89,
Cha91]).
• The normalization algorithm of Merrheim [Mer94] is extended for
a wider class of operations (with δ > 2).
• Appropriate controller representations for on-line arithmetic im-
plementations are discussed.
• A basic on-line library is implemented and its structure is given.
• The question of the choice of the radix is investigated.
• Two implemented and tested controller implementations are pro-
vided.
1.4 Outline of the Thesis
The structure of the thesis follows a path from a general introduction
of on-line arithmetic to the suggested extensions and implementation
guidelines.
8 Chapter 1. Introduction
First, an overview of on-line arithmetic is given in Chap. 2. The
general operator structure is explained. This includes important char-
acteristics like on-line delay and period as well as the redundant number
systems used. The latter allows parallel additions without carry prop-
agation and thus serial computations with MSDF. Chapter 2 will give
insight in how the basic on-line arithmetic operators (on-line addition,
multiplication) work and how the input and output data can be con-
verted between redundant and standard number systems. In the later
chapters, the internal structure of the on-line operators is of little im-
portance. The focus is mainly on their interfaces for the interconnection
of different operators.
In Chap. 3, two design concepts are discussed which extend the ba-
sic on-line arithmetic in a way that simplifies the implementation of
the desired controllers. The first design method imposes a common
interface for all operators and forces the system designer to specify a
common scale and number of significant digits for all intermediate re-
sults in advance. Afterwards, a controller construction is simply realized
by connecting these modular operators to a complex algorithm. These
modular on-line operators become possible due to an appropriate ini-
tialization and normalization extension of the basic on-line operators.
These necessary extensions are discussed in detail. Modularity is im-
portant for an inexperienced user but demands also a certain sacrifice
in hardware size and computation speed. Therefore, a second design
method is introduced which leads to smaller and faster solutions. It
leaves the scale of the intermediate results open but demands slightly
more insight into how to place initialization and normalization units.
Additional implementation guidelines for the use of the two design
concepts are given in Chap. 4. In the first part of this chapter mostly
control specific aspects are discussed, for example advantageous con-
troller representations and the simplification of multi-adders as they
appear in many controllers. The second part is dedicated to hardware
and software aspects of controller implementation. Field programmable
gate arrays are introduced and an on-line arithmetic library of the ba-
sic operators and extensions, developed in collaboration with Arnaud
Tisserand from ENS, Lyon, is discussed.
In serial arithmetic the choice of the radix plays an important role
because it changes remarkably the number representation (for higher
radix the operand length is smaller) and therefore the number of nec-
essary clock cycles for a specific operation. However, this gain in speed
is offset by an important increase in hardware size. This contradic-
1.4. Outline of the Thesis 9
tory situation is illustrated in Chap. 5. For implementations of higher
complexity (non-linear operations like divisions and square roots) with
hard computation time constraints, higher radixes are often advanta-
geous. However, for most control applications radix 2 implementations
are fast enough and smaller in size.
In Chap. 6, the proposed on-line arithmetic solutions are compared
to digit-parallel implementations. This is undertaken with consideration
for the imposed computation time constraints by the sampling period.
This comparison provides hints for the choice between an on-line or
a digit-parallel solution. Providing quantitative results for the criteria
speed, size and power consumption is a difficult task because of the high
number of influencing parameters.
The theory and guidelines presented in the earlier chapters are ap-
plied to two controller examples, presented in Chap. 7. The first imple-
mentation, a classical PID controller for a space application, represents
a case where on-line arithmetic is superior to digit-parallel arithmetic
because of its small operator size and the simplicity of the control algo-
rithm. In the second example, i.e. a two degrees of freedom controller
for a piezo system, the controller complexity requires a large number of
simple operations (multiplications) in on-line arithmetic, but only one
multiply–add operator is necessary in the digit-parallel case because of
low computation time constraints. However, even in this unfavorable
situation, on-line arithmetic outperforms digit-parallel arithmetic with
regard to circuit size and clock speed (important for power consump-
tion).
Finally, Chap. 8 discusses the main contributions and relates the
available results to industrial requirements. It also points out where
further research is needed to improve or extend the results presented.
It should be emphasized that the block sizes of operators in figures
are only chosen for clear representation and not in order to compare
the real operator sizes. Therefore, they are often not to scale. It could
be misleading that the large multiplication operations often seem to be
smaller than the small final adders.

Chapter 2
On-Line Arithmetic:
A Short Overview
On-line arithmetic appears to be little known, except by a few groups
of researchers who have developed the theory during the last 20 years
[ET77, BDKM94]. For that reason, a short introduction is given here.
Further details can be found in [Erc84, EL88a].
In on-line arithmetic the operands, as well as the results, flow through
arithmetic units in a digit-serial fashion starting with the most signifi-
cant digit first (MSDF).
xi+  δ yi+  δ
p i
On-Line
Operator
δ   τ
operands
res ult
0 0
inva lid
x1 x2 x3 x4 x

5	 x6 x7
p
 1 p2
 p3 p4 p5	 p


6 p


7
τ
Figure 2.1: Delay and clock period of on-line operations
Important characteristics of on-line operators are (see Fig. 2.1):
• Their delay δ which is defined as the difference in rank between
input digits and output digits. This number depends on the chosen
algorithm and the radix. Usually, the on-line delay is a small
11
12 Chapter 2. On-Line Arithmetic: A Short Overview
integer (e.g. 1 to 4). In computer architecture literature this value
is usually called latency of a pipeline associated to an operator.
• Their period τ . The period is the time needed by the signal to
cross through the longest path of the circuit (electrical propaga-
tion delay). This value limits the maximum clock frequency.
In Fig. 2.2 an example of an on-line arithmetic computation is given.
The delays are indicated below the operators. Some registers are nec-
essary for synchronization in the lower path.
si  n
lo g
x 2
+
dela y 4 dela y 3
dela y 4 regis ters
dela y 2 dela y 4
a
b
sin2a +	  log b
total de
 lay 13
Figure 2.2: Example of an on-line arithmetic computation
The principal advantages of on-line arithmetic are:
• The parallelism due to the digit-level pipeline which allow an over-
lap of successive operations.
• The small size of operators (see Tab. 2.3).
• The small number of interconnections.
• All common operations can be computed in on-line arithmetic
(division, square root, sin, cos, logarithm, exponential ...).
• The precision can be easily controlled (by the number of digits
shifted through).
Serial computations with the most significant digits first become pos-
sible owing to a change in the number system (see also Sect. 2.1). The
redundant number systems used [Avi61] allow several representations
for the same number.
Example: The number 0.a1a2 =
∑2
i=1 air
−i of radix r = 2 can have
negative ai. This leads to several representations for some num-
bers ( 14 = 0.a1a2 with a1 = 0 and a2 = 1 or a1 = 1 and a2 = −1).
2.1. Redundant Number Systems 13
On-line arithmetic was introduced by Ercegovac and Trivedi in 1977
[ET77]. Nowadays, on-line algorithms are available for all common
arithmetic operations, in the fixed-point representation as well as in
the floating-point representation, but they have been rarely used in
hardware applications (e.g. [BDKM94, EMT95, NM96, Erc78, Tu90]).
This is mainly due to the different original motivation (high-precision
computation) and the lack of a convenient formulation for an efficient
hardware implementation.
In recent years more effort has been spent on implementation is-
sues of single on-line arithmetic operators (e.g. [BDKM94, Tu90]). Two
different approaches have been chosen. One follows the recursive formu-
lation of Ercegovac [Tu90] and the other [BDKM94] is based on Avizie-
nis’ parallel adder (Fig. 2.3b). The former approach uses a general for-
mulation which is valuable for all operations computable with on-line
arithmetic. In this framework the ith digit of the result is generated
from the (i+ δ)th input digit and an intermediate state with a so-called
digit-selection function. The overall functionality (e.g. on-line addition)
is determined by the choice of this function. The computation of the
digit selection function and the state update are often done by standard
digit-parallel arithmetic operators. In the latter approach the output
digits are generated in a forward fashion without recursion. This leads
to much smaller implementations but is limited to a few operations (ad-
dition, multiplication). The application examples presented at the end
of this thesis (see Chap. 7) are mainly concerned with size and power
consumption requirements and additions/multiplications represent the
majority of the operations. Therefore, the second approach will be in-
troduced in more detail in this chapter. However, all implementation
guidelines given in Chap. 3 concern only the interface between on-line
operators and thus they are also valid for operators of the first type.
2.1 Redundant Number Systems
In a usual number system, a positive fractional number A ∈ R+ is writ-
ten using a radix r (r > 0) as
∑∞
k=1 akr
−k, ak ∈ D = {0, 1, . . . , r − 1}
for all k, where D is called the digit set and k is called the rank.
In 1961, Avizienis [Avi61] proposed to represent radix r numbers
using a signed digit set Dr = {−a,−a+ 1, . . . , a− 1, a}, where a ≤ r−1.
The sign assignment is done on the digit level. Thus, negative numbers
are treated similarly to positive numbers. Owing to the negative digits
14 Chapter 2. On-Line Arithmetic: A Short Overview
these systems are called signed number systems. For 2a + 1 ≥ r, all
numbers are representable. If the number of elements in Dr is larger
than r (2a + 1 > r) then some numbers have several representations.
For example, the number 2435 (in the usual system) in radix 10 with
the digit set {−5,−4,−3,−2,−1, 0, 1, 2, 3, 4, 5} can be written as 2435
or 244(−5). Therefore, the system is called redundant.
Redundant number systems are of particular interest because there
exist algorithms for full parallel additions without carry propagations.
The algorithm 2.1, proposed by Avizienis in [Avi61], shows such a carry-
free parallel addition for radixes higher than 2.
Algorithm 2.1 : Parallel addition (Avizienis 1961)
Inputs : x = 0.x1x2 . . . xn and y = 0.y1y2 . . . yn
Result : s = s0.s1s2 . . . sn
These numbers are written in radix r with digits from the digit set
{−a, . . . , 0, . . . , a}, where 2a ≥ r + 1 and a ≤ r − 1. One defines
w0 = tn = 0
I) For i ∈ [1, n] in parallel, perform :
ti−1 =

1 if xi + yi > a− 1
0 if −a+ 1 ≤ xi + yi ≤ a− 1
−1 if xi + yi < −a+ 1
wi = xi + yi − r × ti−1
II) For i ∈ [0, n] in parallel, perform :
si = wi + ti
In algorithm 2.1 the carry ti+1 does not depend on ti. Therefore,
there is no carry propagation and the computation time for additions is
independent of the number size (O(1)).
The algorithm of Avizienis presented above is not valid for radix 2
because the conditions 2a ≥ r + 1 and a ≤ r − 1 cannot be satisfied
simultaneously. However, there are algorithms in radix 2 guarantee-
ing a constant computation time which use the carry-save (digits from
{0,1,2}) or the borrow-save (digits from {-1,0,1}) representations. The
carry-save representation is often used in multipliers. In this chapter we
2.1. Redundant Number Systems 15
have chosen the borrow-save representation because of the easy handling
of negative numbers.
The borrow-save representation was introduced by A. Guyot, Y. Her-
reros and J.M. Muller in [GHM89]. The digit set is {−1, 0, 1}, and the
bit-level representation of the digits is defined as follows: the ith digit
ai of a number a is represented by two bits, a+i and a
−
i , such that
ai = a+i − a−i . The digit codings are given by Tab. 2.1.
digit representation (a+, a−)
−1 (0, 1)
0 (0, 0) or (1, 1)
1 (1, 0)
Table 2.1: Digit representation in borrow-save.
Example for the borrow-save notation (negative digits are indicated
by a bar, e.g. −1 = 1¯):
0.625 = 0.101 = (0, 0).(1, 0)(0, 0)(1, 0)
= 0.111¯ = (0, 0).(1, 0)(1, 0)(0, 1)
The algorithm 2.2, proposed in [GHM89], shows the carry-free par-
allel addition for radix 2 in the borrow-save representation.
Algorithm 2.2 : Parallel borrow-save addition [GHM89]
Inputs : x = 0.a1a2 . . . an and y = 0.b1b2 . . . bn
Result : s = s0.s1s2 . . . sn
These numbers are written in radix 2 with digits from the digit set
{−1, 0, 1}
I) Initialization : c+n = s
−
n = 0
II) For i ∈ [1, n] in parallel, compute c+i−1 and c−i from:
a+i + b
+
i − a−i = 2c+i−1 − c−i
III) For i ∈ [1, n] in parallel, compute s−i−1 and s+i from:
c−i + b
−
i − c+i = 2s−i−1 − s+i
one defines: s+0 = c
+
0
16 Chapter 2. On-Line Arithmetic: A Short Overview
Both algorithms, Alg. 2.1 and Alg. 2.2, can be formulated into digit-
serial forms. A digit of rank i depends on input digits of rank i+ 1 and
i + 2 in Alg. 2.1 and Alg. 2.2, respectively. The on-line versions will
therefore have the delays 1 and 2, respectively. Despite the larger delay,
the borrow-save algorithm is preferred in the applications considered
here because of the simpler digit representation and smaller operator
size (more details in Chap. 5).
For the conversion between standard and redundant radix 2 num-
bers, see Sect. 2.4.
2.2 On-Line Arithmetic Operators
The borrow-save number system allows arithmetic operations in a fast
and convenient way and, as mentioned above, without carry propaga-
tion. It is especially this property which makes the digit-serial compu-
tation in the MSBF direction possible. In order to give an idea of the
internal complexity of on-line operators, more detail is given of the addi-
tion of two numbers, the addition of several numbers, the multiplication
with a constant number as well as polynomial evaluations. These are
the most frequent operations used in controller implementations. For
the division algorithm only the basic idea is given. The subsections
about polynomial evaluation and division can be found in original and
more detailed form in the thesis by A. Tisserand [Tis97]. The operator
examples given are, for simplicity reasons, in radix 2. On-line arithmetic
in radix 2 has been studied in [Erc84, BDKM94] where more details can
be found.
2.2.1 On-Line Adder
Consider the following operation with numbers in the borrow-save rep-
resentation:
a = 0.a1a2 . . . an =
∑n
i=1(a
+
i − a−i )2−i
b = 0.b1b2 . . . bn =
∑n
i=1(b
+
i − b−i )2−i
a+ b = s = s0.s1s2 . . . sn =
∑n
i=0(s
+
i − s−i )2−i
It is shown in [BDKM94] that the digits si of s can be obtained either
with the parallel carry free architecture presented in Fig. 2.3a (corre-
sponding to algorithm 2.2) or with the corresponding on-line operator
in Fig. 2.3b. Note that the size of the on-line adder is independent
2.2. On-Line Arithmetic Operators 17
of the operand length whilst the parallel adder grows linearly with the
operand length.
+
 
− +
 
−2 +
b1 + a1
 +
b1 −
− − +
 
2− + 
+
 
− +
 
−2 +
b2 + a2
 +
b2 −
− − +
 
2− + 
+
 
− +
 
−2 +
b3 + a3
 +
b3 −
− − +
 
2− + 
+
 
− +
 
−2 +
b4 + a4
 +
b4 −
s0 + s0 − s1
 + s1

− s2 + s2 − s3 + s3 − s4 + s4 −
− − +
 
2− + 
0 0
a1

− a2 − a3 − a4 −
c1
 − c1
 + c2	 − c2
	 +
c3

 − c3
 + c4
 −
si−2 − si−2+
+
 
− +
 
−2 +
bi + ai +
bi − ai
−
− − +
 
2− + 
re
 g re g
re g
a ) b )
Figure 2.3: a) A parallel adder and b) an on-line adder
(ranks are indicated by indexes) [BDKM94]
The main building blocks for both algorithms are ppm cells (plus plus
minus), which reduce 3 bits, xi, yi and zi, of the same rank to 2 bits,
ui and ti−1, one of the same rank and the carry, so that xi + yi − zi =
2ti−1−ui. A ppm cell is very similar to a standard full adder cell, apart
from an additional inverter, as shown in Fig. 2.4. In the parallel addition
algorithm of Fig. 2.3a, carry propagation is avoided by subsequently
reducing groups of 4 bits (a, b) of the same rank to 3 bits of the same
rank for an intermediate representation (c) and finally 2 bits of the
same rank for the result (s). The on-line adder is derived from the
parallel scheme. As shown in Fig. 2.3b, the on-line delay of the adder is
δ = 2, which means that two operand digits have to be clocked into the
operator before result digits appear on the output. Subtractions (a− b)
are realized by exchanging positive (b+) and negative bits (b−) on the
input.
2.2.2 On-Line Multi-Adders
In [BDKM94] it was shown that the idea of reducing the number of
bits by ppm cells (every ppm reduces the number of bits by 1) leads to
an efficient multiple number addition operator (N numbers), an oper-
18 Chapter 2. On-Line Arithmetic: A Short Overview
+
 
+
 
x k zk

tk-1 u k

y k
−
2

+
−
− +
 
tk -1 u k

−
2

− +
 
x k y

k zk
x k y

k zk
tk -1 u k
Figure 2.4: mmp (minus minus plus) and ppm (plus plus minus) cells
(indexes indicate ranks) [BDKM94]
ation which is common in polynomial and state-space controllers. For
inputs with the same rank it has an optimal delay of δopt = dlog2Ne+1
(instead of δ = d2 log2Ne for a binary tree of adders) and it is easily
extendable to inputs of different ranks. This possible combination of
single operators to more specific ones reduces the on-line delay and gate
number and prevents the appearance of intermediate results. Especially
the last point is very important in order to avoid truncation errors in
polynomial expressions where intermediate results are often very differ-
ent in scale from the final result.
A multi-adder example with three inputs of the same rank k is shown
in Fig. 2.5. At the input, 6 lines with rank k enter into the adder. They
are reduced by ppm cells and registers, respectively, until there are only
two lines of the same rank left (see Tab. 2.2). The on-line delay of the
resulting multi-adder is δ = 3. The same operation realized by simple
adders in a pipeline leads to an on-line delay of δ = 4.
6× (k)
—2ppm–> 2× (k), 2× (k − 1)
—2reg—> 4× (k − 1)
—1ppm–> 2× (k − 1), 1× (k − 2)
—2reg—> 3× (k − 2)
—1ppm–> 1× (k − 2), 1× (k − 3)
—1reg—————————————–> 2× (k− 3)
Table 2.2: Computation sequence for multi-adder of Fig. 2.5
An interesting property of on-line adders is that their size is inde-
pendent of the operand’s length. In [Mul94], a characterization of func-
tions computable with on-line operators bounded in size is given. The
2.2. On-Line Arithmetic Operators 19
−
+ 
− 2−
+
2+
−
ak +
ak −
bk +
bk − sk−3−
sk−3+
ck +
ck −
−
+ 
− 2−
+
k
k
k
k
k
k
k k−1
k−1
k−1
k
k−1
k−1
k−2 k−3
k−3
k−2
k−2
k−2
+
−
+
−
2+ −
+

+
Figure 2.5: A multi-adder with 3 inputs of the same rank
(intermediate ranks are indicated on the connections) [BDKM94]
piecewise affine functions with rational coefficients belong to this class
(functions like f(x) = ax+b and f(x, y) = ax+by+c, with a, b, c ∈ Q).
However, operations like multiplications, divisions or square root com-
putations do not belong to this class. Their size is proportional to the
operand’s length.
2.2.3 On-Line Multiplication
In the literature, several on-line multipliers have been presented (see
for example [ET77, BDKM94]). In radix 2, there exists an architecture
with an optimal delay of 2, but its period grows with the size of the
operands. Mostly, on-line multipliers with delay 3 and a constant period
are chosen. Here, the basic idea of an on-line multiplier with a constant
number is given because it represents a common operation in linear
controllers.
It is necessary to compute the product p = x× a in on-line arith-
metic, where x = 0.x1x2 . . . xn is the input, a = 0.a1a2 . . . an is a con-
stant number and p = 0.p1p2 . . . p2n is the product, all represented in
the borrow-save notation. The following partial products P (k) have to
be computed as the digits of x become available for k ≤ n:
P (0) = 0
P (k+1) = P (k) + xk+12−k−1a
In an implementation with the optimal on-line delay δ = 2, P (k+1) is
computed as follows (see Fig. 2.6):
The partial product xk+12−k−1a is obtained using digit by
digit products (realized by multiplexers, see the lower part of
20 Chapter 2. On-Line Arithmetic: A Short Overview
Fig. 2.6). This is added to the former intermediate result to
form the new intermediate result (stored in registers, upper
part of Fig. 2.6). Contrary to a digit-parallel multiplier, not
all digits of the intermediate result are stored in registers,
but the two leading digits are separated. They form serial
outputs of rank k + 1 and k + 2, respectively. These serial
outputs are fed into an on-line adder which produces the
intended product in serial form (right side of Fig. 2.6). The
construction of the final adder is similar to the multi-adders
shown above.
parallel   adder
re g
a5
0 re g re g re g
a2
 a 1
re g
+
Symbols:               digit x di git multiplier (multiplexer)
+ on-line  adder
xk+1
pk -1
re g borrow-save r egister (2 bit)
p'k +2
intermedi ate result
const	 ant a
s1


s0
s2s5 p'k

+1s3s4
a3a

4

Figure 2.6: On-line arithmetic constant multiplier (with n = 5 bit)
The period of the resulting multiplier is the time needed for the signal
to pass through 4 ppm cells, 1 multiplexer and 1 register, and its size
is independent of the operand length (O(1)), but grows linearly with
the constant length (O(na), see [BDKM94, Mul94]). If shorter periods
are necessary (and a small increase of the on-line delay is acceptable)
intermediate registers can be added. The on-line delay of the multiplier
2.2. On-Line Arithmetic Operators 21
presented is determined by its final adder (δ = 2).
The on-line multiplier in Fig. 2.6 can be modified, as shown in
[BDKM94], in order to compute efficiently squares or binomials (ax+y,
where a is a constant number). This allows the computation of various
functions using polynomial approximations (sin, cos, exp, log . . .). The
separation of the combinatorial part and the final adder of the multi-
pliers allows the combination of several constant multipliers and adders
to polynomial operators with one common final adder. An example
is given later (see Sect. 7.2.2) for the implementation of a polynomial
controller for a piezo system. The first intermediate result is thereby
already the controller output and thus scaling and truncation errors are
reduced to a minimum.
For more details about other multipliers the reader is referred to the
existing literature [EL88a, BDKM94].
2.2.4 On-Line Division and Square Root
Several algorithms and implementations of on-line division have already
been proposed in the literature [ET77, Irw78, IO79, EL85, IO87, ET87,
LS87, ET89, LE92, MRM93, LE93b]. They are all based on the re-
cursive method of Ercegovac and this section presents an illustration
of this method. The computation of a division depends on the order
of magnitude of its entries, namely the divisor and divident. This im-
poses some normalization procedures which make the algorithms more
or less complex. Usually, the division is an area intensive operator and
there are several possible implementations. The same algorithm can
lead to different compromises between delay and size. It is possible
for example to keep the delay small by choosing a very complex (and
therefore large) digit selection function. No divisions are required for
the controller implementations in Chap. 7. However, in order to give the
reader an example for the recursive method of Ercegovac, we present the
algorithm of [MRM93] below. The result of this algorithm is q = a/b
with a < b and 12 ≤ b ≤ 1.
As can be seen in algorithm 2.3, in every step an intermediate state
(w) is computed and a digit selection function (select) is evaluated. The
choice of these two parts specifies the computation (addition, division,
...) in this method.
Square root algorithms and divisions are very similar and thus sev-
eral algorithms have been proposed in the literature [Erc78, OE82,
LE93a, EL94]. They require the same compromise between delay and
22 Chapter 2. On-Line Arithmetic: A Short Overview
size as divisions.
Algorithm 2.3 : On-line division (delay 5)
Initialization : a[0] = 0.a1a2a3a4, b[0] = 0.b1b2b3b4, w[0] =
a[0] and q[0] = 0
For i from 1 to n perform:
ci = select(2w[i− 1])
w[i] = 2w[i− 1] + ai+42−4 + q[i− 1]bi+42−4 − cib[i− 1]
b[i] = b[i− 1] + bi+42−i−4
q[i] = q[i− 1] + ci2−i
where select(x) = {1 if x ≥ 14 , −1 if x ≤ −14 , 0 else}
2.2.5 Evaluation of Polynomials
The fast evaluation of polynomials is important for scientific computa-
tions and special applications. Already in 1885, Weierstrass showed that
any continuous function can be approximated to an arbitrary accuracy
in a compact interval by polynomials. For controller implementations
we are specifically interested in their ability to approximate elementary
functions (sin, cos, log, exp, tan ...). For their evaluation several differ-
ent architectures have been proposed [DM88, MP90, MMY93]. Espe-
cially, the Horner scheme leads to very regular and modular realizations
(see Fig. 2.7). This regularity, which is particularly important for re-
alizations in integrated circuits and FPGAs, is a direct consequence of
the computation scheme:
P (x) =
d∑
i=0
aix
i = a0 + x(a1 + x(a2 + x(. . . (ad−1 + adx) . . .)))
where d binomiers (ax+ b) are used in series for the evaluation of a of
degree-d polynomial.
In [Baj93, CDHM91] several studies of polynomial evaluations based
on on-line operators are presented. The direct use of the Horner scheme
for the implementation of a polynomial of degree d leads to an operator
with delay 3×d (the delay of a binomier is 3). In practice the period τ of
such an operator is often too long (longest path traverses all binomials)
2.2. On-Line Arithmetic Operators 23
x
+
a  4
a  3
x
x
+
a2
x
x
+
a1

x
x
+
a  0
x
a0+a1x+a2x2

+a3x3+a4x4
ax+ b
x
a 
b
x
+
Figure 2.7: Evaluation of a polynomial (deg = 4) with Horner scheme
and registers have to be inserted after each binomial. Therefore, the
on-line delay of an operator using the Horner scheme is 4× d.
In order to reduce this delay various other architectures have been
proposed. The divide-and-conquer method shown in Fig. 2.8 uses a tree
of binomiers [DM88]. This method can guarantee a logarithmic on-line
delay, but requires square operations. This leads to circuits which are
twice the size than with the Horner scheme. The objective is mainly
high speed.
x +
a3 a2 
x
a1
 a0
a0+a1x+a

2x2+a3x3
x +x
x +x
 2
Figure 2.8: Divide-and-conquer architecture for polynomial evaluation
The E-method proposed by Ercegovac [Erc77] is a method, inspired
by the Horner scheme, which allows the evaluation of polynomials of
24 Chapter 2. On-Line Arithmetic: A Short Overview
degree d with an on-line delay of d. In [Tis94, EMT95] an on-line im-
plementation of the E-method on a DEC-PeRLe1 card was studied. This
card, designed by the Paris Research Laboratory of DEC [BRV89], con-
sists of a matrix of 16 FPGA XC3090 from Xilinx and 7 other XC3090
around the matrix for execution control and communication with the
host computer. The computed polynomials were of degree 16 with 74 bi-
nary digits. The gain in execution delay in comparison with the Horner
scheme comes from more complex operations than binomials and a digit
selection function inspired by the division algorithms. The E-method
leads in general to larger circuits than for the Horner scheme.
Often the original function can be approximated by polynomials of
lower degree when dividing the evaluation interval into several subin-
tervals. In each subinterval a different set of coefficients is used. For
this purpose [Kla93] combines the Horner scheme with the use of lookup
tables. The first few digits of the operands are used to decide on the
subinterval and to index a lookup table which hosts the correspond-
ing coefficients. The working principle of this method is represented in
Fig. 2.9.
a  4 a
 
3
a  2
a1
a  0
switc h 2Lookup

 Table
a4x+ a3
swit ch 1
y3x+ a2 y2x+ a1
y1x+ a0
+
tanh (x)
off set
first d igits
y3
	
y2
 y1 y0
x
Figure 2.9: Polynomial evaluation combining lookup-table and Horner
scheme
This method has been used for the tanh evaluation in a neural net-
work implementation [GT96]. The operator realized allows an evalua-
tion of the tanh function in the interval [−4, 4] in a fixed-point repre-
sentation with 24 bit. The original interval was cut into 16 subintervals
with polynomials of degree 5. The global surface of the operator is
about 600 logic blocks of an XC4020 FPGA from Xilinx.
2.3. Speed and Size of Redundant Arithmetic 25
2.3 Speed and Size of
Redundant Arithmetic
Table 2.3 shows the time and the area complexity of the main arith-
metic operators using a parallel, a LSDF and an on-line approach. The
operand length is assumed to be n. Then, the time complexity of the
LSDF and on-line arithmetic is obviously O(n).
Parallel LSDF On-Line
Operation Time Area Area Area
± O(1) O(n) O(1) O(1)
× O(log2 n) O(n2) O(n) O(n)
÷ O(log22 n) O(n2) impossible O(n)√
O(log22 n) O(n
2) impossible O(n)
ax+ b O(log2 n) O(n2) O(1)∗ O(1)∗
Table 2.3: Time–area complexity of the main arithmetic operators
Note that besides the advantageous area of on-line arithmetic for all
operations, their time complexity for nonlinear operations, like square
root and division, are close to those of parallel operators. For com-
putations with hard time constraints this results in multiple copies of
operators in the digit-parallel case which are very costly in hardware,
whereas the pipelining in the on-line case treats non-linear operations
like others.
2.4 Conversions between
Standard and Redundant Numbers
The conversion from a standard radix 2 number s =
∑n
i=1 si2
−i to a
redundant number b =
∑n
i=1 bi2
−i is obvious (b+i −b−i = si with b+i = si
and b−i = 0 for instance). In the case of a 2’s complement number, the
most significant digit has a negative weight. Thus the conversion to a
borrow-save representation can be done on the fly. For the conversion
from a redundant number to an analog output three different approaches
are possible, where a+ =
∑n
i=1 a
+
i 2
−i and a− =
∑n
i=1 a
−
i 2
−i:
∗The operator size is linearly dependent on the length of constant a (O(na))
26 Chapter 2. On-Line Arithmetic: A Short Overview
1. A usual LSDF addition (with carry propagation) a = a+ − a−.
The conversion time for this approach is given by the computation
time of the adder (O(log2 n)) plus the D/A conversion delay.
2. Ercegovac’s on-fly conversion algorithm [EL87b]. It computes the
sum a = a+ − a− on the fly. This requires the storage of two
intermediate results at all times and the final result is chosen with
the last digit. Thus the conversion time for this approach is one
clock period plus the D/A conversion delay.
3. Two D/A converters in analog difference arrangement. The sum
(voltage(a) = voltage(a+)− voltage(a−)) is computed in an ana-
log way (see Fig. 2.10). The conversion time using this approach
is the D/A conversion delay only.
The third method was used for the implementation examples in Chap. 7,
because of the highest speed obtained and the small additional hardware
requirements.
D/A-
conve  rter,
serial input
D/A-
conve  rter,
serial input
Analog
Output
+

−
r+
Difference
Amplifier
r-
On-Line 
Operator
Figure 2.10: D/A conversion of a redundant result r = r+ − r− by
using the analog difference
Chapter 3
Design Concepts for
On-Line Arithmetic
Controllers
Previous work has focussed more on single on-line operations than on
their interconnection to implement complex algorithms. Consequently,
no uniform framework has existed and usually arithmetic experts have
been needed for the implementation of specific algorithms. Hans Brack-
ert stated in his PhD thesis that besides the advantages he sees in the
use of on-line arithmetic for recursive digital filters, the “... implemen-
tation of an on-line arithmetic unit is not a simple task.” ([Bra89], p. 3).
These implementation problems are mainly due to the serial character
of on-line arithmetic and to the non-unique representation of redundant
numbers.
In this section the controller design will be simplified by supplying
implementation guidelines for a systematic construction of real-time dig-
ital controllers in on-line arithmetic. Two different design principles are
demonstrated: one puts restrictions on the input and output representa-
tion of each on-line arithmetic operator and thus offers a set of modular
operators which can be interconnected in a convenient way; the other
leaves the representations of intermediate results open (possible because
of digit-pipelining) and normalizes only output and looped values. The
latter demands more insight into the basic on-line arithmetic proper-
ties, but offers a lower sensitivity to rounding and truncation errors of
27
28 Chapter 3. Design Concepts for On-Line Arithmetic
intermediate results.
Both principles make use of a library of basic fixed-point on-line op-
erations whereas each operator is realized following the mathematical
description in the literature. Their interfaces consist of a set of serial
inputs and outputs and an additional operator reset port (see Fig. 3.1).
The difference between the two design methods lies more in the arrange-
ment of necessary extensions around the mathematical algorithms than
in the realization of the arithmetic operation itself.
a 
b
r
Mathematical
Algorithm
operator
reset
z{serial inputs serial o utput
Figure 3.1: Common interface for operators of the arithmetic library
The guidelines given are independent of the radix used, but for sim-
plicity reasons and because of the final implementation examples in
radix 2, most of the illustrations are given for radix 2. The concepts
shown concern more the interface of on-line arithmetic operators than
their internal structure. Therefore, the realization of the mathematical
operations is of no importance for the use of the implementation guide-
lines. Either the recursive or the direct method can be employed (see
Chap. 2).
3.1 Controller Constructions based on
Modular On-Line Arithmetic Operators
This section will explain the first design procedure and its necessary ex-
tensions. In the first method we recommend the construction of modu-
lar on-line arithmetic operators. They should have a common interface
which allows the interconnection of several operators in order to imple-
ment complex algorithms even for a non-specialist in the field of on-line
arithmetic.
3.1. Modular On-Line Arithmetic Operators 29
In order to simplify the data exchange between different operators,
the scale of inputs and outputs must be well defined, and obsolete dig-
its have to be cut off. As an indication of the validity of digits in the
data flow, an additional control signal becomes necessary. In the fol-
lowing this signal is called the control line. It is used for initialization,
normalization and synchronization purposes. The value of this signal
is synchronized to the serial data inputs/outputs and indicates if valid
digits are present or not. The mathematical operators described in the
literature don’t have these flow control functions. Therefore, they need
to be extended. In this framework each operator is composed of four
main building blocks: initialization, mathematical algorithm, normal-
ization and output switch.
ctr_in ctr_outInitiali  zation
Out-
Switchin it
Normal-
ization
Mathematical 
Algorithm
opera nds resu lts
Modular On-Line Ar ithmetic Operator
Figure 3.2: Modular on-line arithmetic operator (block sizes are not
to scale)
The different mathematical algorithms can be found in the literature
(e.g. [BDKM94]). They are supposed to have the interface described in
Fig. 3.1. The initialization resets the registers of the arithmetic opera-
tor and delays the control line corresponding to the arithmetic operator
delay. This indication of the operation start is necessary because most
on-line operators compute the first δ digits differently from the continu-
ous flow afterwards. The normalization forces the output to a predefined
representation (e.g. n digits after the decimal point). As in any fixed-
point arithmetic scheme, a number with absolute value larger than the
highest representable number will thereby saturate the output. Addi-
tionally to these three blocks, an output switch is used which forces all
digits between the operands to zero in order to avoid interference of sub-
sequent operands. The three blocks are explained in more detail in the
following Subsections 3.1.1, 3.1.2, 3.1.3. The resulting modular on-line
arithmetic operators enable system designers to construct controllers for
mechatronic systems in on-line arithmetic without advanced knowledge
30 Chapter 3. Design Concepts for On-Line Arithmetic
in computer arithmetic. For a controller design it is sufficient to specify
the range of the intermediate values and to connect the blocks following
the design rules given in the folowing subsections.
3.1.1 Initialization of On-Line Arithmetic Operators
In digit-serial arithmetic the operands are distributed over several sub-
sequent operations (operators work digit wise) and there is an internal
state update in the operators at each clock period (e.g. computation of
the partial product in multipliers). Therefore, a clear indication of ev-
ery operation start is necessary for initialization of the internal registers
used. A simple way to achieve this is by a distributed control scheme
in the form of the additional control line synchronized to the operands
mentioned above. The line is kept high if significant operand digits are
present at the inputs (ctr in) and respectively at the outputs (ctr out),
and is otherwise low. Internal state and status values (e.g. intermediate
results in multiplications) are thereby reset as soon as an operator is un-
used. In Fig. 3.3 the initialization (init) is shown for an on-line adder in
radix 2. The two registers in the init block are necessary to compensate
the operator on-line delay (δadder = 2). As soon as ctr in = ctr out = 0
the three registers of the adder are reseted. The initialization takes at
least one clock cycle (see Fig. 3.4).
a−
b+
 
b−
+
−
+
−
2+
−
+
−
+
2− s+
s−
Re g
Re g
Re g
ctr_in ctr_out
a+ on-line  adder
in it
out-sw itch
Re gRe g
in it
Figure 3.3: On-line adder modified for real-time control
In the initialization scheme the digits of the result must have left
the operator entirely before the reset can be achieved. Otherwise the
last δ (on-line delay) digits of the result would be wrong. Therefore
at least (δmax + 1) intermediate zeros between the operands must be
inserted at algorithm entry, where δmax is the largest delay of all of the
operators in the entire algorithm. In Fig. 3.4, an algorithm is supposed
to have two operators and δmax = δOp1 > δOp2. When ctr1 becomes
3.1. Modular On-Line Arithmetic Operators 31
ini  t1
ini  t2
Op 1 Op 2
a+

a−
b+
b−
r−
r+

ctr 1 ctr3
FiF o
in it in it
ctr 2
ctr 1
ctr 2
ctr 3
ini  t1 ini
 
t2
δO p1
δO p2
δin it1
δin it2
Figure 3.4: Initialization and synchronization of on-line operators,
init = ctrin ∨ ctrout
low intermediate zeros have to be inserted on the entries a and b. The
additional delay is introduced for the initialization.
The zeros increase the sampling period because they cause a delay
between subsequent operands. One way to avoid this delay is to separate
the digit accumulation and the digit generation part of the operators.
This is done by design in the recursive operator formulation of Erce-
govac, but leads to an additional copy of the original operator in the
direct formulation. However, in mechatronic control applications a new
controller input (at the sampling instant) is only taken at the same time
or after the last controller output was supplied to the physical system
(termination of the D/A conversion). This sampling time delay, which
has to be taken into account for the controller design, introduces many
more intermediate zeros anyway. Fig. 3.5 shows a case where converter
resolution and operand length in the arithmetic are the same. Thereby,
the number of zeros is determined by:
δzeros = δD/A + δA/D + δarithmetic (3.1)
where δD/A, δA/D, δarithmetic are the delays of the D/A converter, the
sampler of the A/D converter and the input-to-output delay of the con-
troller, respectively. Note that δzeros has to fulfill the above mentioned
condition:
δzeros ≥ δmax + 1 (3.2)
This is usually the case. Otherwise additional intermediate zeros have
to be inserted.
In order to avoid interference from subsequent numbers, these in-
termediate zeros have to be maintained, even after several operations
32 Chapter 3. Design Concepts for On-Line Arithmetic
Sampling
Instant
A/ D Co  nversion
δA / D Convers ion Unit
δArith metic δn
δD /A
tim e/τ
Sampling
Instant
δD /A
δze rosIntermedi	 ate Zeros
δper
 iod
Figure 3.5: Controller timing (length(operand) = n = res(A/D)), δn
stands for the n digit delay due to the length of the operands
in the algorithm. This output switching can be realized with the ad-
ditional control line (see Fig. 3.3, out-switch block). This disabling
of operator outputs becomes particularly important for operations like
multiplications where the result has a larger representation than the
input operands.
The distributed control scheme presented improves the modularity
of the design and offers some simple flow control functions. Controller
execution can be stopped easily by resetting the registers in the initial-
ization block and the operand ranges can be chosen by simply shifting
the control signal.
3.1.2 Normalization
In redundant number systems some numbers have several representa-
tions (e.g. 1 − 14 = 1.01¯ = 0.11 = 12 + 14 in radix 2, notations as in
Sect. 2). This property implies that in on-line additions the sum may
be represented by n+ 1 valid digits whereas the operands and the theo-
retical result only need n digits. In multi-adders even several additional
digits are possible. In order to avoid a continuously growing number
of digits after additions, especially in state loops (growing number of
additions), a conversion to a limited representation becomes necessary.
Otherwise, truncation operations to a limited representation can lead
to large errors because the most significant parts of numbers could also
be cut off.
In previous literature, two approaches have been presented. One is
the complete on-the-fly normalization algorithm proposed by Ercegovac
3.1. Modular On-Line Arithmetic Operators 33
and Lang [EL87b] which converts redundant numbers into conventional
digital representations. Its basic idea is to accumulate subsequently op-
erator output digits and compute two partial results at all times, one
anticipating that the next digit will be positive or zero, whilst the other
expects a negative digit. The final result is received with the last digit.
Contrary to a standard addition, this method avoids delays related to
the propagation of carry associated with the sign differences. Another
method is Merrheim’s normalization algorithm [Mer94] which generates
a redundant fractional number with zero unit part (i.e. 0.s1s2s3 . . .).
The former causes a delay of n clock cycles in forward branches (pipelin-
ing of several on-line operators) and is more difficult to implement than
the latter. Merrheim’s algorithm works well (without on-line delay!) for
feed-forward branches and loops. However, in its original form it is only
appropriate for additions of two numbers. Therefore, an extension of
Merrheim’s algorithm also suitable for multi-additions is proposed.
Proposition: Depending on the choice of scale for the intermediate
results, only two types of result need conversion (|s| < r−x, the
first valid digit of the normalized result should have rank x+ 1).
All other results are already normalized or they saturate for the
given scale:
↓
Rank x−k ··· x x+1 ··· x+m ··· x+n
1 1− r ··· 1− r 0 ··· 0 −a sm ··· sn
⇒ 0 0 ··· 0 r − 1 ··· r − 1 r − a sm ··· sn
−1 r − 1 ··· r − 1 0 ··· 0 a sm ··· sn
⇒ 0 0 ··· 0 1− r ··· 1− r a− r sm ··· sn
where r is the radix and a, (−a) satisfying 0 < a < r, is the first
non-zero digit with rank greater than x. The decimal point is not
indicated because it is of no importance for the normalization. The
arrow (↓) indicates the first valid digit of the normalized result.
Proof: Consider si and s′i to be the digits before and after the conver-
sion, respectively. Suppose there are k + 1 non-zero digits before
the first digit of the chosen scale. Then up to the (x+m)th digit:
x+m∑
i=x−k
sir
−i = rk−x + (1− r)
k−1−x∑
i=−x
ri − ar−x−m
34 Chapter 3. Design Concepts for On-Line Arithmetic
= rk−x − r−x(rk − 1)− ar−x−m
= r−x − ar−x−m
= (r − 1)
x+m−1∑
i=x+1
r−i + (r − a)r−x−m
=
x+m∑
i=x+1
s′ir
−i
Conversion of the negative case is shown similarly. 2
Remark: In case of overflow, the closest possible value appears on the
output ((r − 1) . . . (r − 1) or (1− r) . . . (1− r), respectively).
As can be seen in the scheme shown above, the normalization al-
gorithm is very simple. The first 1 (-1) digit of the redundant result
has to be detected and propagated to the right until the first negative
(positive) digit appears. The following digits are left unchanged. This
conversion can be done on-the-fly, that means simultaneously to the
shift operation of the digits, without introducing any on-line delay. The
digit position to which the operand should be normalized (x+ 1 in the
proof) is indicated by the control line output, ctr out.
A numerical example in radix 2 is given. Suppose 11¯1¯.001¯11¯1 to be
the result of a multi-adder on-line operation which should be normalized
to a fractional number. Then the normalization extension will change
the digits as they appear to the normalized result of 000.11111¯1 without
any on-line delay.
The algorithm was implemented for radix 2 in an Actel FPGA and
requires approximately the space of 20 Actel 2 cells. This cell number
is not significant if used only a few times in a design (operator combi-
nations reduce occurrence, see Sect. 4.1).
3.1.3 Synchronization
Implementations of dynamic systems give rise to loops in the signal flow
due to the states. These states need to be synchronized to the signals in
the forward flows. In Fig. 3.4 the output of the second operator (Op2)
is a state that is used as input to Op2 at the next sampling instant. For
synchronization, shift registers of appropriate size have to be included
in the backward branch. Two solutions have been investigated. The
classical approach assumes a shift operation at each clock cycle and
3.1. Modular On-Line Arithmetic Operators 35
register sizes have to be adapted accordingly:
δreg = δperiod −
∑
δforward (3.3)
where δreg, δperiod, δforward are the register size (in number of shift
stages), the sampling period (in number of clock cycles) and the delay
of the operators in the forward part of the loop, respectively. This
method is simple, but requires a lot of additional registers for a big
δzeros (see Eqn. 3.1).
Op  1
Op  2
a+
a−
b+

b− r−
r+

FiF o
δOp1  = 3
δOp2  = 2
δFIFO  = 15
Figure 3.6: State register without flow control, δn = 8,
δA/D = δD/A = 2
An example is given in Fig. 3.6. Op1 has the highest delay of the
algorithm (δOP1 = δmax = 3). This leads to δarithmetic = 3 + 2 = 5
and δzeros,min = 4 (Eqn. 3.2). We assume δA/D = δD/A = 2. This
leads to δzeros = 2 + 2 + 5 = 9 (Eqn. 3.1). For an operand length of
δn = 8 bit this gives a period of δperiod = 17. The resulting register size
is therefore:
δreg = δperiod − δOp2 = 17− 2 = 15
that means 7 shift steps larger than the operand length.
The second approach assumes a shift operation only when the stored
digits are needed in the operator, otherwise the shift operation is dis-
abled. Therefore, the register size equals the number of operand digits n
and an extra enable input to the shift register has to be supplied. When
enabled, the register works similarly to a standard shift register; when
disabled, it keeps its content and outputs zeros in order to avoid any
interference with ongoing computations. The enable signal is generated
from simple synchronization conditions as shown in Fig. 3.7 with the
aid of the control line. The OR and AND gate of Fig. 3.7 can be integrated
into the register. The result is a shift register with two additional inputs
(later called on-line fifo) which are both connected to the ctr in signal
in the case of a register in the forward path and connected to ctr in
36 Chapter 3. Design Concepts for On-Line Arithmetic
and ctr out in the case of a loop. The main advantage of this approach
is the register size, which is only dependent on the number of operand
digits and not on the delay of the operators.
Op  1
ctr_in
FiF o
in it ctr_out
in it
in 1
in 2
ou t
en
Op  1
ctr_in
FiF o
in it ctr_out
in it
in
ou t
en
Register in Forward Branch Register in Fe edback Loop
Figure 3.7: Flow control of state registers (size
δreg = n = length(operands)) in forward and loop arrangement
In situations with several parallel data paths, different on-line delays
can be compensated, similarly to standard serial arithmetic, by inserting
an appropriate number of shift registers (see e.g. [Kas98] or Fig. 2.2).
3.2 Controller Construction based on
Global Execution Control
The modular on-line operators described above are easy to use and are
well adapted to situations where the scale of the intermediate results
differs very little. Otherwise, the number of operand digits have to be
increased in order to avoid truncation errors. This leads to an aug-
mentation of necessary clock cycles per sampling period and to a more
complicated input/output interface. In order to avoid these two dis-
advantages, an alternative controller design procedure can be used —
global execution control. Its design rules demand more insight into the
theory of on-line arithmetic, but they can guarantee the same preci-
sion of the final result as with modular on-line operators by keeping
the number of the operand digits equal to the I/O resolution. This is
possible due to a variable scale of the intermediate results. The main
difference of the global execution control in comparison to the modular
on-line operators lies in the use of the extensions.
Fig. 3.8 shows an example for a controller composed of 4 mathe-
matical operations. Only the initialization extension is employed for
3.2. Global Execution Control 37
every operator, the normalization and out-switch are restricted to loops
and controller output. In the following subsections the initialization,
normalization and synchronization of the global execution control will
be put into perspective compared with the modular on-line operators
described in the previous section.
ctr_out
ou  t
Op 1
ctr_in
On-Line
FiFo
init
(ext)
ctr _1
in_1
Op 2
Op 3 Op 4
in_2
ctr _2
Normalization
an d 
Out-Switch
ctr_s cale
init
(ext)
init
(ext)
init
(ext)ctr_a head
Normalization
an d 
Out-Switch
Figure 3.8: Design example based on global execution control
(the extended initialization is explained in Figs. 3.9, 3.10)
3.2.1 Extended Initialization
The initialization operation (init) introduced for the modular on-line
operators in Subsect. 3.1.1 is based on a normalized representation of
the intermediate operands. However, in the global execution scheme the
normalization operation is restricted to the controller output and feed-
back loops only. Therefore, input digits with rank higher than the lowest
digit of the chosen operand length can still propagate their influence to
valuable output digits. An initialization as for modular on-line opera-
tors (with the falling edge of the control line output, ctr out) would lead
to truncation errors of control output. A way to avoid these truncation
errors is to ensure additional output digits on intermediate results. This
can be done by extending the initialization operator by a chain of reg-
isters. These registers, indicated in Fig. 3.9b, delay the influence of the
control line output on the initialization signal. The number of necessary
registers is equal to the on-line delay of the path between the operator
38 Chapter 3. Design Concepts for On-Line Arithmetic
concerned and the controller output:
nreg = δpath (3.4)
Note that these additional registers are only necessary for operators
which produce additional digits, like multiplications, or if they are pre-
ceded by such operations. In all other cases the additional digits are
equal to zero and therefore don’t change the controller output.
ctr_in
in  it
Re gRe g
in it
Re g Re g
in  it
Re gRe g
in it
Re g Re gctr_in
Re gRe g Re g
in  it
Re gRe g
in it
Re g Re gctr_in
Re gRe g Re g
ctr_a h ead ctr_a head
in  it
Re gRe g
in it
Re g Re gctr_in
Re gRe g Re g
ctr_a h ead ctr_a head
ctr_s hift
0123
	
4


δoperat or = 4
δoperator = 4
δpath        = 3
δoperator = 4
δpath        = 3
δoperator = 4
δpath        = 3
δshift        = 1
a )
b )
c )
d )
ctr_out
ctr_out
ctr_out
ctr_out
Figure 3.9: Extending the initialization for global execution control
The initialization in the global execution scheme has another par-
ticularity. The missing normalizations between single operators imply
that already the first input digits (those to Op1 and Op2 in Fig. 3.8)
influence leading digits in the subsequent operators through combina-
toric paths. In the last operator (Op4 in Fig. 3.8), the leading digits
3.2. Global Execution Control 39
produced are used by the final normalization and thus influence the
controller output. An initialization as in the modular on-line opera-
tor scheme (until rising edge of the control line input, ctr in) would
cut off the leading (high value) digits of intermediate results and thus
cause large truncation errors in the controller output. In order to stop
the initialization of subsequent operators with the propagation of the
first digit, the initialization has to be extended by a second control line
which is fed through all initialization operators. This line is indicated
as ctr ahead in Fig. 3.9c.
3.2.2 Extended Normalization
As mentioned above the normalization and the out-switch for the global
execution control scheme is restricted to loops and controller output.
The normalization operator used is identical to that of the modular
on-line operator scheme, i.e. the normalization starts as soon as output
digits are generated (rising edge of control line input, ctr in) and it
normalizes to the rank indicated by a rising edge of the control line
output, ctr out. In the global execution control design, the scale of
the looped values can be set to an arbitrary value in advance. The
necessary normalization can be controlled by simply shifting the control
line output. A non-shifted control line output indicates an identical scale
for operator output and looped value whereas each shift register changes
the scale by a factor of r (r is the value of the radix). The easiest way to
realize this shift is to add a fourth output to the initialization operator
as indicated in Fig 3.9d (ctr shift). It takes intermediate values of the
control line and forces the normalization thereby to the specified scale.
init
(ext)
in  it
ctr_in ctr_out
ctr_a head ctr_a head
ctr_s hift
δoperator, δpath, δshift
Figure 3.10: Interface of the extended initialization operator
As shown in Fig. 3.10, the three extensions described in Fig 3.9
can be combined to a new extended initialization operator with one
input (control line input ctr in), three outputs (control line shifted
ctr shift, control line output ctr out, initialization signal init) and one
feed-through (forward control ctr ahead). The new initialization (init
40 Chapter 3. Design Concepts for On-Line Arithmetic
(ext)) has three generic parameters (δoperator, δshift, δpath) which have
to be adjusted for each individual operator. In general not all of its
ports are used for every initialization. In the design example of Fig. 3.8
only the initialization of operator 3 (Op3) is fully connected.
3.2.3 Extended Synchronization
The synchronization in the global execution scheme is similar to the
modular on-line operator case. Both solutions indicated in Sect. 3.1.3
are possible, the synchronization with adapted register size (Fig. 3.6) or
the controlled shift registers with constant size (Fig 3.7 or on-line fifo
in Fig. 3.8). For calculation of the register size of the former approach
the shift operation mentioned in 3.2.2 has to be taken into account:
δreg = δperiod −
∑
δforward + δshift (3.5)
where δreg, δperiod, δforward, δshift are the register size (in number of
shift stages), the sampling period (in number of clock cycles), the delay
of the operators in the forward part of the loop and the shift of the
control line output, respectively.
3.3 Design Example
This section will illustrate the two design methods on an implementation
example. The PID (proportional integral differential) algorithm with
filtered differential part is given by:
uk = K
(
ek−1 +
h
TI
xi,k−1 + TDxd,k−1
)
(3.6)
with xi,k−1 = ek−1 + xi,k−2
xd,k−1 = γ (ek−1 − ek−2 + τxd,k−2)
where ek = sk − yk is the difference between set-point sk and measure-
ment yk, xi,k and xd,k the controller states, uk the controller output, h
the sampling period, τ the time constant of the differential part filter,
γ = 1/(h + τ) and K, TI , TD are the proportional, integral and dif-
ferential gain, respectively. For the delay of one sampling period h see
Fig. 3.5.
Contrary to Sect. 4.2 and Sect. 7.1, where the equations will be trans-
formed into another representation before the implementation, herein
3.3. Design Example 41
the original form of Eqn. 3.6 will be used directly. In order to show all
necessary design adjustments, the controller constants and the sampling
time of an existing current loop are chosen.
h = 25 · 10−6s (3.7)
K = 6
TD = 0.1
TI = 0.6
τ = 0.01
An implementation with modular on-line arithmetic operators is
shown in Fig. 3.11. Therein each operator is extended by an initial-
ization, normalization and out-switch block.
ctr_in
e 
xd
in 1
in 2 x i u
+

δ =  2
+

δ =  1099.7 51
99.7 51
0.99 75
+
δ = 126	
2.5
 e-4
0.6
norm
an d
switch
in it
ctr_out
−
δ =  2
ol-F
 iFo
ol-F
 iFo
ol-F iFo
norm
an d
switch
norm
an d
switch
norm
an d
switch
in it
in it
in it
Figure 3.11: Modular on-line arithmetic operator design of the PID
controller
The multiply–add operations are realized by multi-operators (see
Sect. 2.2.2, 4.1). The execution control is entirely distributed by the
additional control line. The synchronization is made by on-line FIFOs.
In the controller representation chosen, the scale of the integral state xi
and the error signal e are very different and because of the large delay
(δ = 12) of the last operator, the output is influenced by digits with
much higher rank than the chosen output resolution. Therefore, the
number of the operand digits has to be higher than the input/output
resolution in order to avoid truncation errors. A precision of 12 digits on
the output requires in the case presented a representation of 4 leading
42 Chapter 3. Design Concepts for On-Line Arithmetic
digits before the point and 24 digits behind. These extra digits are
inserted by forcing the control line input to be generated 4 clock cycles
in advance of the A/D-conversion digits and by adding 12 extra zeros at
the end of the input. At the output, the valuable digits are separated by
adding additional registers in the last initialization block. In Fig. 3.11,
the necessary interface signals were omitted because they are specific
to the chosen hardware. Note that the generation of these additional
digits increases the size of the controller interface. Additionally, the
sampling period is at least 3 digits larger (the other 13 digits can be
used as intermediate zeros).
The same control algorithm can also be implemented in the global
execution design. This is shown in Fig. 3.12.
−
ctr_in
e 
x d
in 1
in 2
e 
x i u
δ =  2
+

δο = 6
+
δο = 1099.7 51
99.7 51
0.99 75
+

δο = 126
	
2.5e
 -4
0.6
ol-F iFo
ol-F iFo
ol-F iFo
norm
an d
switch
norm
an d
switch
norm
an d
switch
init
δp = 0,δs = 0
init
δp = 0,δs = 4
init
δp = 0,δs = 0
init
δp = 12,δs = 0
ctr_out
Figure 3.12: Global execution control design of PID
Normalization and out-switch are now restricted to loops and con-
troller output. For each initialization unit, three sets of design param-
eters have to be adjusted. The first set of parameters, the operator
delays, are the same as for the modular on-line arithmetic operator im-
plementation except for the adder delay. It is bigger (δ = 4 instead of
δ = 2) because the integral state will be stored with 4 leading digits
whereas the inputs and the output are still represented with 12 digits
behind the point. It is for the same reason that the shift parameter is
3.3. Design Example 43
δshift = 4 for the adder and zero for the others. The third set of param-
eters, the path value for additional result digits, is only necessary for
one of the add-multipliers because its result is used in the final multi-
operator (δpath = 12). It is remarkable that the same output precision
as with the modular on-line operator scheme can be reached without
increasing the sampling period and without any change of the standard
controller interface. Also one normalization extension could be avoided.
The cost is a more complex adaptation of design parameters and longer
combinatoric paths which can sometimes require intermediate registers.

Chapter 4
Implementation
Guidelines
The following guidelines concern both design methods presented in
Chap. 3, the modular operators as well as the global execution scheme.
The first two sections discuss the most advantageous operator and con-
troller representations for an on-line arithmetic implementation. This
is followed by a section explaining the possibilities to reuse certain op-
erators several times in the same controller. Finally, suitable hardware
platforms are introduced briefly and a VHDL on-line library, written
for test and verification purposes, is described.
4.1 Simplifications with Multi-Operations
In control algorithms scalar products are very common (e.g. two de-
grees of freedom controllers or state space controllers). On-line arith-
metic offers an efficient simplification for these kinds of operations. The
static logic part of all multiplications can be executed in parallel with-
out on-line delay and the final addition can be performed, as already
mentioned in Sect. 2.2, by a simplified final adder with a much smaller
delay than a binary tree of adders. The algorithm is given in [BDKM94].
This simplification keeps the overall delay of the controller as well as
the number of individual operators very small. The consequence is a
reduced computation time and a small number of necessary normaliza-
tion units in the modular on-line operator scheme (scaling is limited
45
46 Chapter 4. Implementation Guidelines
to a few intermediate results). In Fig. 4.1 the computation scheme for
a two-degrees-of-freedom controller is shown. The controller employed
R(q−1)uˆ(t) = T (q−1)rˆ(t)− S(q−1)yˆ(t) (uˆ, rˆ, yˆ stand for the digital val-
ues of the input and output signals, q−1 for the backward shift operator
e.g. aq−1uˆ(t) = auˆ(t − h) where a is a constant and h is the sampling
time, respectively) is:
R(q−1) := 1− 1.1636q−1 + 0.16356q−2 (4.1)
S(q−1) := 0.9665q−1 − 0.6229q−2 + 0.5895q−3 − 0.3513q−4
T (q−1) := 0.7758q−1 − 0.9558q−2 + 0.7619q−3
The multiplication blocks indicate the static part of constant multi-
pliers and the sum is realized by a multi-adder with delay δ = 5 instead
of a binary tree of adders which would require a delay δ = 9 and a much
higher number of registers.
 
uk+1
t1 
+
FIF O
rk
r

1  δ =  5
yk
r

2
s1 
t2
t3
s4
s2
s3
FIF O
FIF O
FIF
 O
FIF O
FIF
 O
FIF O
Normal	 ization Out-S


witch
Initiali

zation
uk
rk−1
yk−1
yk−2
 ctr _in  ctr_o ut
re f
po s
Figure 4.1: Multi-adder for a two degrees of freedom controller
The only disadvantage of multi-operations is a longer critical path
and thus a lower maximum clock frequency. In control applications clock
speed is usually lower than that limit (because of the A/D converter)
and in cases where a higher clock speed is required, buffers can be
inserted to shorten the critical path.
Multi-operations allow an abundance of different operators which are
easy to construct by following systematic rules (see e.g. [BDKM94]). In
order to keep the number of individual operators in an on-line library
4.2. Appropriate Controller Representation 47
reasonable, the generation of the final operator is done by code genera-
tion procedures which take the characteristic properties of the intended
algorithm as input (e.g. number of inputs and corresponding scales) and
produce a synthesizable operator description. In this thesis a procedure
for the automatic generation of multi-adders has been written. It takes
the number of product and scalar terms as well as their ranks as in-
puts and generates automatically the corresponding VHDL1 code of the
necessary on-line arithmetic operator. This simplifies enormously the
construction of linear controllers and reduces the number of individual
adders in the on-line library.
4.2 Appropriate Controller Representation
Dynamic controllers consist usually of two parts: a state update and a
direct computation. The controller equations can be written as:
zk+1 = f(zk, sk, rk) (State equations) (4.2)
uk = g(zk, sk, rk) (Output equations)
where zk, sk, rk, uk are the controller states, the measurements, the
references and the controller outputs, respectively. The controller states
act thereby as auxiliary variables. In contrast to sequential processing
with digit-parallel operators, the state updates and the computation of
new controller output values are done in parallel in on-line arithmetic.
Therefore, the highest complexity of f or g determines the number of
necessary clock cycles per sampling interval. This complexity is mainly
influenced by the choice of the states. Most controllers allow several
state space representations with an identical input/output behavior.
The goal of this section is to find the most appropriate representation
for an on-line arithmetic computation with respect to computation time
and numerical properties.
In on-line arithmetic not only the arithmetic structure itself but also
the scale of intermediate results play an important role with respect to
computation time. As illustrated in the example shown in Fig. 4.2,
the range of multiplicative constants has a significant influence on the
operator delay. The number of operations and their input/output be-
havior is the same in both representations but the online delays and thus
also the sampling times differ significantly. This example is kept simple
1Hardware Description Language, see Sect. 4.4
48 Chapter 4. Implementation Guidelines
for demonstration purposes, but the effect shown can be observed with
any state-space transformation. The two controller representations are
given as:
Representation 1:
xk+1 = xk + 0.9(sk−1 − yk−1) (4.3)
uk = 2xk + 0.1(sk−1 − yk−1)
Representation 2 (transformation xk = 0.1zk):
zk = zk + 9(sk−1 − yk−1) (4.4)
uk = 0.2zk + 0.1(sk−1 − yk−1)
 uk+1
+
  FIF O 2
0.9
+ 
xk+ 2
δ =  4
δ =  3
xk+ 1
s k
y k
0.9
0.1
0.1
+  FIF
 O 0.2
9
+
 
zk+ 2
δ =  1
δ =  7
zk+ 1
s k
y k
9
0.1
0.1
 
uk+1
Represen tation 1
Represen tation 2
Figure 4.2: Influence of constant scaling on operator delay
This is due to the serial character of on-line arithmetic computa-
tions. Intermediate results with large scale differences, as generated for
example by large multiplicative constants, require internal registers for
synchronization. These additional registers increase the operator delay.
In a state space representation based on Jordan form the scale of the
multiplicative constants is very similar because the diagonal values are
equal to the controller poles and these approach a value of 1 for fast
4.2. Appropriate Controller Representation 49
sampling. The state updates and the output equations can be computed
in parallel and the constants in the dynamic matrix are scaled (|aii| ≤ 1
for standard controllers). Additionally, the number of multiplications
is low due to the zeros of the block diagonal representation of the dy-
namic matrix. Based on the advantages mentioned above the use of
the Jordan form for controller implementations in on-line arithmetic is
recommended.
The transformation to Jordan Form is highlighted by an implemen-
tation example. The simple proportional integral differential (PID) al-
gorithm with filtered differential part is given:
uk = K
(
ek−1 +
h
TI
xi,k−1 + TDxd,k−1
)
(4.5)
with xi,k−1 = ek−1 + xi,k−2
xd,k−1 = γ (ek−1 − ek−2 + τxd,k−2)
where ek = sk − yk is the difference between set-point sk and measure-
ment yk, xi,k and xd,k the controller states, uk the controller output,
h the sampling period, τ the time constant of the differential part fil-
ter, γ = 1/(h + τ) and K, TI , TD are the proportional, integral and
differential gain, respectively.
Transformation to Jordan form gives:
zk+1 = Azk +Bek−1 (4.6)
uk = Czk +Dek−1
A =
(
τγ 0
0 1
)
B =
( −TDγ(1− τγ)
h
TI
)
K
C =
(
1 1
)
D =
(
1 +
h
TI
+ TDγ
)
K
This controller can be implemented in on-line arithmetic following
the scheme in Fig. 4.3.
However, two special cases should be mentioned where others forms
can be superior to the Jordan form: first for oversampled designs and
second for single input single output (SISO) systems. For oversampled
designs (sampling frequency much higher than necessary) the controller
poles and therefore the constants in the dynamic matrix approach the
value of 1. This causes problems for a fixed-point representation of the
theoretical controller constants because the difference from 1 can be
smaller than the LSD (least significant digit) and thus the implemented
50 Chapter 4. Implementation Guidelines
 uk+1   to 
FIFO D/Ab2
 +

+
 ol-FI FO
ek
d
a1 1
+

z1(k

+2)
z2(k

+2)
δ =  3δ =

 3
δ =  3
z1(k

+1)
ol-FI FO
−
δ =  2s
	
k
y
 k re g
δ =  1
z2(k

+1)
b1
Figure 4.3: Implementation of the PID controller in Jordan form,
control lines are not indicated
constant value will be equal to 1. A way to overcome this numerical
problem is to use a slightly different controller representation. The mo-
tivation for this modification is related to the delta operator, introduced
by Middleton and Goodwin [MG86]. The authors define a discrete in-
cremental difference operator (called delta operator) by:
δ =
q − 1
h
(4.7)
where q, h denote the usual forward shift operator and the sampling
period, respectively. The operator δ−1 replaces the usual unit de-
lay (q−1) used in standard shift operator discrete models. This leads
to marginally more complex implementations with superior numerical
properties. State space models are transformed to a representation
which is similar to integrators in continuous-time models. In the case
of Jordan form the introduction of the delta operator splits the compu-
tation of the diagonal values into two parts:
aii · xi = (1 + a′ii) · xi = xi + a′ii · xi (4.8)
Thus aii constants with values close to 1 are separated into a unity part
and an easily scalable difference. The differences in scale between a′ii
and 1 can be taken into account for the construction of the necessary
multi-adder. For the PID example this modification leads to the system
representation of Eqn. 4.9. The corresponding implementation scheme
is shown in Fig. 4.4. Only one operator changed in comparison to the
scheme in Fig. 4.3. However, truncation errors are reduced remarkably.
zk+1 = zk +A′zk +Bek−1 (4.9)
uk = Czk +Dek−1
4.2. Appropriate Controller Representation 51
A′ =
(
τγ − 1 0
0 0
)
B =
( −TDγ(1− τγ)
h
TI
)
K
C =
(
1 1
)
D =
(
1 +
h
TI
+ TDγ
)
K
 uk+1   to 
FIFO D/Ab2
 +

+
ol-FI FO
ek
d
a'1 1
+
z1(k

+2)
z2(k

+2)
δ =  3
δ =  3
δ =  3
z1(k

+1)
ol-FI FO
−
δ =  2s
	
k
y
 k re g
δ =  1
z2(k

+1)
b1
Figure 4.4: Numerically improved implementation scheme for
oversampled PID controller
For SISO systems, compact controller representations in the form of
difference equations are often preferred to the state space descriptions
mentioned above. When multi-operations (see Sect. 4.1) are employed,
they avoid the generation of intermediate values and all of their related
scaling operations. However, the coefficients of these compact repre-
sentations have often very different orders of magnitude and thus the
on-line delays can be much higher than in the state space representation.
Therefore, mostly, controller implementations for both representations
are constructed and the decision for a specific design is taken based on
performance comparisons. The PID controller in this compact represen-
tation is given in Eqn. 4.10 and the corresponding scheme in Fig. 4.5. It
has an on-line delay of δPID = 11 + 3 = 14 which is remarkably higher
than that for the Jordan form δ = 3 + 3 = 6.
uk = t1ek−1 + t2ek−2 + t3ek−3 + r2uk−1 + r3uk−2 (4.10)
with:
t1 = K(1 +
h
TI
+ TDγ)
t2 = −K(1 + γ(τ + hτ
TI
+ 2TD))
t3 = Kγ(τ + TD)
52 Chapter 4. Implementation Guidelines
r2 = 1 + τγ
r3 = −τγ
 uk+1 to 
FIFO D/A
ol-F  IFO
ek t1
+
δ = 11
−
δ =  2s

k
y k re g
δ =  1
t2
t3	
r2
r3	
ol-F
 
IFO
ol-F
 
IFO
ol-F
 
IFO
Figure 4.5: Compact input–output representation of the PID controller
As shown in Fig. 3.5 of Sect. 3.1.1, the controller dead time for fast
sampling designs implemented in on-line arithmetic is chosen advanta-
geously to be exactly one sampling period. Consequently, the controller
has no direct term. The dead time of one period has to be taken into
account during controller design.
4.3 Reuse of Operators
in the Same Algorithm
The multiple use of a limited number of operators is common for se-
quential digit-parallel arithmetic. The operators are thereby scheduled
by the instructions fetched from the program memory. In this case, the
scheduling is simple because a preceding operation has entirely finished
before a subsequent one starts. In on-line arithmetic the situation is
different. The operands are spread out over several subsequent opera-
tions and the synchronization of the data paths has to be ensured at all
times. Therefore, operator scheduling becomes much more challenging.
For a high precision computer working in on-line arithmetic a fea-
sibility study of the scheduling problem was carried out (see the CA-
RESSE project [Agu94]). The goal was to compute large algorithms
with a limited number of on-line arithmetic operators. Two scheduling
strategies were given: a static scheduling for cases where all on-line de-
lays of single operators are constant and a dynamic scheduling for cases
where some operators have variable delay (dependent on operands).
4.4. Hardware and Software Support 53
The flexibility of this instruction-based scheduling turns out to be very
costly in terms of logic gates.
For real-time digital controllers with the objective of minimal gate
number and simple implementation of an invariant structure, as shown
in the preceding sections, the concept is far too general. Not only im-
plementation cost is too high but also the fact that micro-system con-
trollers have generally too small on-line delays (smaller than operand
resolution) to allow the reuse of operators in the same controller. There-
fore, reuse of operators is only considered in cases where several loops
are performed by the same controller structure, eventually with different
controller constants.
The reuse encounters three main difficulties: the data-flows and con-
troller constants have to be redirected, the sampling period for each sin-
gle controller has to be ensured and each proper set of state values has
to be stored for the next controller evaluation. For a two axes controller
the necessary modifications are illustrated in Fig. 4.6.
In order to manage this switch operation between several axes, the
interface must generate, in addition to the control line, a signal which
indicates the actually executed axis (toggle in Fig. 4.6). This signal is
decoded by multiplexers at the connection points and thus ensures the
correct data-flow. In order to avoid a loss of data in the unused shift
registers, they have to be disabled based on the value of the axis indica-
tor (toggle). In the lower part of Fig. 4.6, the controller timing is shown.
Following the timing scheme in Fig. 3.5, A/D and D/A conversions have
to be synchronized. In the reuse scheme Fig. 4.6, every controller out-
put has to be buffered for a part of the sampling period. This is done
by a shift register which is cyclically charged by the different controller
outputs whilst sending its own output to the D/A converter (DA Buffer
in Fig. 4.6). The shift operation is controlled by the control line output.
The scheme presented can easily be extended to more axes. The
advantage is only a small increase of hardware necessary compared to
the one axis solution. However, the available computation time for each
controller is only a part of the sampling period and can thus become
critical (tcomputation = h/m, m number of axes, h sampling period).
4.4 Hardware and Software Support
In automatic control systems measurement feedback is used to improve
the dynamic behavior of a physical system. In the past most controllers
54 Chapter 4. Implementation Guidelines
contr  oller
re f
po s
ou t
ctr_out
A/D_ref1
A/D_pos1
A/D_ref2
A/D_pos2
D/ A
D/ A
D/ A
D/ A
togg le togg le
DA B uffer
ena ble
togg le
togg le
tog gle togg le
a 1 a2	
+

FIF O
FIF
 O re f
po s
ou t
togg le
deco der
en 2 en 1
ena ble
ena ble
A/D 1 A/D

1 A/D

1
A/D 2 A/D

2
D/A 1
D/A 2
D/A 1
D/A 2
D/A 1
Calcul ation1 Buffe r  out1
Calcul ation2 Buffe r  out2
Calcul ation1 Buffe r  out1
Buffer  out2
tim e
ou t1
ou t2
Calcul ation2
Figure 4.6: Reuse of on-line arithmetic operators for a two axes con-
troller
were realized in analog electronics. Nowadays, they are often imple-
mented in the form of digital algorithms in micro-controllers, DSPs or
ASICs. For space applications the choice of the available components
is very restricted and the individual prices are very high due to harsh
environmental conditions (radiation, high accelerations, large temper-
ature range). Most of these applications have prototype character or
they are only built in very small numbers of samples. Therefore, an
ASIC (application specific integrated circuit) development is difficult
to justify. However, in recent years the capacity of field programmable
gate arrays has increased enormously and the trend goes towards an
integration of logic system functions and the dynamic controller into
4.4. Hardware and Software Support 55
these FPGAs; especially FPGAs with anti-fuse technology, that means
circuits which are one time programmable and afterwards hardwired,
have proved to be compatible with space requirements. The signal pro-
cessing of the dynamic controller occupies the majority of the circuit and
thus a size-optimized implementation is advantageous. With FPGA ef-
ficient implementations of algorithms in hardware can be made with
software like design principles (for examples see [Kas98], [Tri94]).
These gate arrays are complex integrated circuits, which usually con-
sist of two-dimensional arrays of programmable logic cells (often called
CLBs2). The logic cells are designed for a variety of different logical
functions. Their size (known as their grain) can vary considerably from
one device to another, ranging from complex lookup-table based archi-
tectures (coarse grain) to much smaller hardwired elements (fine grain).
The cells are connected to each other through a programmable connec-
tion network. A hardware designer can use an FPGA to implement
a different kind of digital logic circuit by defining the functionality of
each logic cell as well as the connections between the cells. The per-
formances are close to those of ASICs (application specific circuits) but
with added software flexibility. In the past, all details of an ASIC had to
be simulated first. This was very costly in simulation time and comput-
ing resources and technological constraints were sometimes neglected.
With FPGAs the functionality can be tested on real hardware support
with software flexibility. Therefore, FPGAs are particularly useful for
rapid prototyping (by replacing ASICs) or directly as controllers for
special mechatronic applications (e.g. space applications). An example
of the internal structure of an FPGA is shown in Fig. 4.7. Figure 4.8
illustrates how an on-line adder can be mapped to two logic blocks of a
Xilinx FPGA (family XC3000).
The controller algorithms are generally composed on a higher ab-
straction level in a graphical or a formal description language (e.g. VHDL
3). A formal description is often preferred because of its greater possibil-
ities for validation and implementation on different hardware platforms.
A netlist of logic cells is thereby automatically synthesized from VHDL
by adequate software tools. Subsequently, this netlist is mapped to an
FPGA by a specific tool depending on the FPGA type.
In the next section, an on-line arithmetic operator library which
was written in VHDL following the design guidelines given in Chap. 3
2Logic block in Xilinx FPGAs (configurable logic block)
3VHDL stands for VHSIC (very high speed integrated circuit) hardware
description language.
56 Chapter 4. Implementation Guidelines
Longline (Br  oadcasting)
Communica tion Router
Input-Output  Block (IOB)
Control Logic  Block (CLB)
Wire Con nections
Figure 4.7: Internal structure of an FPGA
is described. It provides the necessary building blocks for a systematic
construction of mechatronic controllers. With these basic arithmetic
modules, implementing a controller using on-line arithmetic is very sim-
ilar to standard arithmetic approaches.
4.5 On-line Arithmetic Library
In collaboration with A. Tisserand from ENS Lyon a VHDL library of
on-line arithmetic operators has been realized. The goal was to provide
a development platform for the fast implementation of algorithms in on-
line arithmetic on FPGAs following the design rules given above. This
library consists of:
• A complete set of basic operators.
• The control functions described (normalization, initialization and
switch).
• A software validation system.
4.5. On-line Arithmetic Library 57
ai  +
ai  −
bi  +
bi  −
si-2 +
si-2 −
+
−
+
2 +
−
−
+
−
2 −
+
LU T
F

F

F

F
LU T
F

F

F

F

cl k
res	 et
CLB
  1 CLB
  2
M M
M
 M M

M
M
M
ai  +
ai  −
bi  +
bi  −
si-2 +
si-2 −
Figure 4.8: Mapping an on-line adder on 2 logic cells (Xilinx),
LUT = look up table, M = multiplexer, FF = flip flop
Target FPGAs
The library was originally written for two families of target FPGAs:
• Actel FPGAs and especially for the families ACT1 and ACT2.
These FPGAs belong to the group of anti-fuse FPGAs, i.e. they
are only one time programmable. However, they possess the nec-
essary qualifications for space applications, and have small prop-
agation delays.
• Xilinx FPGAs (3000 and 4000 families).These FPGAs have also
been chosen because they are re-programmable (SRAM technol-
ogy). The majority of elementary operations (ppm, digit-by-digit
product, ...) can be realized in one of their logic blocks. Ev-
ery logic block has also two one-bit registers which lead to very
compact implementation of the borrow-save notation.
58 Chapter 4. Implementation Guidelines
The choice of how to decompose the on-line operators into elemen-
tary cells was adapted to the target technologies. However, the majority
of the operators can also be synthesized for other targets (Altera FPGAs
for instance), but the circuits obtained may be larger and slower.
Operators realized
The VHDL code for the on-line operators consists of 3 parts:
• packages.
• elementary cells.
• arithmetic operators.
The packages
Three different packages were written. The first one offers functions to
handle integers and fixed-point reals using borrow-save representation.
It provides conversion procedures to change between all sorts of con-
ventional representations (standard binary, 2’s complement, carry-save)
and borrow-save. The second package handles all forms of bit-vector
data-types and the third one adds mathematical functions for the vali-
dation of results.
Elementary cells
All necessary building blocks for on-line algorithms belong to the group
of elementary cells. This includes ppm cells, all kinds of flip-flops,
latches, multiplexers, digit-by-digit multipliers and different shift regis-
ters.
Arithmetic operators
Parameterized versions of different adders were realized (parallel adder,
on-line adder with 2,3,4... inputs), different multipliers (with a con-
stant, general multiplication, squarer, binomier) and a simple division.
The automatic generation of multi-adders, as described in Sect. 4.1 also
belongs to this part. It allows the construction of arbitrary multi-adders
by a simple characterization of input properties. All complex operators
possess a unified interface (as shown in Fig. 3.1) with borrow-save inputs
and outputs and an additional reset line.
4.5. On-line Arithmetic Library 59
As an operator example the VHDL code of an on-line adder with 2
entries is presented and its corresponding implementation scheme shown
in Fig. 4.9.
use work.borrow_save.all;
entity add_ol is
port (a,b : in bs_digit ; clk : in bit ; reset : in bit ;
s : out bs_digit);
end add_ol;
architecture struct of add_ol is
component ppm
port(x, y, z : in bit ; c, s : out bit);
end component;
for all : ppm use entity work.ppm(behav);
component flip_flop
port(input, clk, reset: in bit ; output : out bit);
end component;
for all : flip_flop use entity work.flip_flop(behav);
signal bi1m, cim, ci1m, ci1p, si1p : bit;
begin
ppm1 : ppm port map(a.p,b.p,a.m,ci1p,cim);
ppm2 : ppm port map(bi1m,ci1m,ci1p,s.m,si1p);
ff1 : flip_flop port map(b.m,clk,reset,bi1m);
ff2 : flip_flop port map(cim,clk,reset,ci1m);
ff3 : flip_flop port map(si1p,clk,reset,s.p);
end struct;
Control functions
As illustrated above, the simple connection of different complex opera-
tors is not sufficient to ensure the correct result. A flow control following
the modular on-line operator concept (see Sect. 3.1) or the global execu-
tion control (see Sect. 3.2) requires additional functions for initialization
and normalization. These extensions were also implemented in param-
eterizable VHDL code and the controller construction can be made as
described in the guidelines presented above.
60 Chapter 4. Implementation Guidelines
z 
c lk
reset
i nput
o ut
c lk
reset o ut
c
input
x
y
z 
c lk
y
s
o utreset
x
s
c
i nput
c lk
r eset
b .m
a	 .p
b .p
a	 .m
p
 pm1
ff1
ff2 p
 pm2
ff3
s .m
s .p
Figure 4.9: On-line adder corresponding to add ol in VHDL example
Validation
The serial character of the on-line arithmetic implementations can give
rise to many design errors. In most applications a validation of the hard-
ware constructed is necessary before it is applied to the physical process.
A mathematical proof of the algorithm applied is usually not sufficient
because many implementation details are neglected in the mathematical
model.
The standard validation method is to use a set of test vectors for
which the theoretical outputs are known. For each presentation of an
input vector, the theoretical and the computed output are compared.
For single on-line operators a complete test with small operand resolu-
tion is feasible. Testing an on-line multiplier with two 4 digit inputs for
example requires 28 possible combinations. This type of test takes a few
minutes on a logic simulator. For higher operand resolutions and com-
binations of on-line operators, a complete test is not possible anymore.
Therefore, only a set of test vectors can be applied.
For the on-line library the single operations were tested completely
and for algorithms a test bench was constructed which allows the ran-
dom test with some input vectors. The idea of the test-bench is to
execute the intended algorithm once with a certified arithmetic library
and in parallel with the VHDL code written for the borrow-save rep-
resentation. The first execution is usually done in a floating-point rep-
resentation. The numerical results of the two executions are compared
and errors are recognized if present.
Chapter 5
The Choice of the Radix
In serial arithmetic the radix has an important influence on computa-
tion time. It is not only the number of digits which becomes smaller
with increasing radix but also the on-line delay δ of the individual op-
erators. However, the gain in computation time often comes with an
important increase in hardware size. The main motivation for a higher
radix is therefore mostly a higher computation speed rather than size
arguments. The question concerning power consumption is situated be-
tween the two. Despite the increase in terms of circuit size the decrease
in necessary clock speed and register sizes can result in a lower power
consumption. These kind of questions are being investigated in a paral-
lel project by A. Tisserand et al. at CSEM. At the end of this chapter
the first results of their work will be presented.
In this chapter, a comparison between radix 4 and radix 2 implemen-
tations will be given, especially for different adders. It will be shown
that the chosen digit set and the bit-level representation influence oper-
ator size and on-line delay. This dependence will be worked out on three
different radix 4 adders implemented on FPGAs. The results presented
in this chapter represent part of a Master’s thesis [For97] supervised by
the author.
5.1 Influence on Computation Time
There are two ways to realize an on-line algorithm in higher radix in
a so called total on-line arrangement (algorithm pipelined with A/D
converter). The first possibility consists of the use of standard radix 2
61
62 Chapter 5. The Choice of the Radix
A/D converters and to regroup several radix 2 digits and convert them
to higher radix digits. The second method directly converts the analog
values to the appropriate radix. For radix 4, the computation time in
both cases is given as:
Tcalc4 = (δn + δ4)T4
where Tcalc4, δn, δ4, T4 are the computation time, number of radix 4
digits, on-line delay and A/D digit period, respectively. However, the
digit period T4 for a direct conversion to radix 4 will only be about
half of the time needed when using a standard radix 2 converter with
subsequent digital conversion. For an illustration of the 2 cases see
Fig. 5.1.
Radix 2 A/D   Converter
D/ A
D/ A
Radix 4 A/D   Converter
Digit Conversion R adix 2 -> Radix 4
T 2
T 4

T 4

Tca lc4
Tcalc4
op  1
op  2
op  3
op  4
op  1
op  2
op  3
op  4
Figure 5.1: Radix 2 and 4 A/D-converters for radix 4 computations
The computation time for radix 2 was:
Tcomp2 = (2δn + δ2)T2
where Tcomp2, δn, T2, δ2 are the computation time, the number of cor-
responding radix 4 digits, the period of a radix 2 digit generation and
5.2. Implementation of Radix 4 Adders 63
the on-line delay for radix 2, respectively.
In the case of a conventional radix 2 A/D convertion, calculations in
radix 4 are only faster than in radix 2 (Tcalc4 < Tcomp2) if the following
condition holds:
δ4 < δ2/2 (5.1)
The gains for small algorithms will therefore only be marginal and the
increased hardware effort will be difficult to justify. However, for direct
radix 4 conversion the digit periods of the radix 2 and 4 solutions become
similar (T4 ' T2) and the condition becomes:
δ4 ≤ δn + δ2 (5.2)
The condition 5.2 is always satisfied and illustrates the potential in
computation time when choosing a higher radix.
5.2 Implementation of On-Line Arithmetic
Radix 4 Adders
The gain in computation time when choosing a higher radix and an ap-
propriate A/D converter is to the cost of an increase in implementation
complexity. For radix 4, the A/D converter has already to separate
4 different voltage levels instead of 2 for radix 2. This makes the de-
vice more complex. For on-line operators the increase in complexity by
showing three different radix 4 on-line adders will be illustrated. Their
difference is in the choice of the digit set and the bit-level coding.
5.2.1 Number and Bit-level Encoding
For the adders presented in radix 4, two different digit sets will be used:
D1 = {−3,−2,−1, 0, 1, 2, 3} and D2 = {−2,−1, 0, 1, 2}. In addition, for
each digit set we will use two kinds of bit-level representations will be
used.
Bit-level representations for D1
For digits ai fromD1 the classical representation uses three bits, whereas:
ai = −4a4−i + 2a2+i + a+i , ai ∈ D1 (Avizienis)
This representation is used for a serial form of the Avizienis adder which
was already introduced in Sect. 2.1.
64 Chapter 5. The Choice of the Radix
A second bit-level representation for the digit set D1 is the so-called
differential representation. Every digit is represented by 4 bits as:
ai = 2a2+i + a
+
i − 2a2−i − a−i , ai ∈ D1 (differential)
It is used for one operand of a hybrid adder. The adder concerned is
called hybrid because representations of inputs and outputs are differ-
ent.
Bit-level representations for D2
Similarly to the bit-level codings for D1, there is one three bit and one
four bit representation of digits ai from D2. Here the three bit version
will be called Montalvo (name of the author of the PhD thesis [Mon95]
who introduced it):
ai = −2a2−i + a+i + a++i , ai ∈ D2 (Montalvo)
It is used for one operand and the result of the hybrid adder.
Here the 4 bit version will be called Forest (name of the author of
the Master’s thesis who introduced it):
ai = a+i + a
++
i − a−i − a−−i , ai ∈ D2 (Forest)
It will be used for an adder, taking two operands from the digit set
{−2,−1, 0, 1, 2} as inputs and producing an output in the same range.
5.2.2 Functional Description of Radix 4 Adders
The main difference between the three on-line adders implemented for
this comparison is the choice of the digit set and bit-level representation
of inputs and outputs. Each adder in this section will be detailed.
5.2.2.1 Avizienis Serial Adder
This adder uses the algorithm 2.1 presented in Chap. 2. The input
and output representations use Avizienis style and the algorithm can
be decomposed into two elementary cells as shown in Fig. 5.2. The first
cell (Avizienis 1 in Fig. 5.2) has the following characteristics:
Entries:
{
ai = −4a4−i + 2a2+i + a+i , ai ∈ D1
bi = −4b4−i + 2b2+i + b+i , bi ∈ D1
5.2. Implementation of Radix 4 Adders 65
bi4− bi2+ bi  +
ti−1 ti−1wi  2− wi  + wi++
a i b

i
Avizie nis 1
ai4− ai2+ ai  +
Re g Re g Re g
ti−1 ti−1wi
 
-1 wi  -1 wi  −1
Avizie nis 2
2− + + +
si−1 si−1 s+i  −1
si−1
+ −
+ −
4− 2+
Figure 5.2: Serial Avizienis adder
Computation: ai + bi = 4ti−1 + wi
Outputs:
{
ti−1 = t+i−1 − t−i−1 , ti−1 ∈ {−1, 0, 1}
wi = −2w2−i + w+i + w++i , wi ∈ D2
The second cell (Avizienis 2 in Fig. 5.2) is described by:
Entries: −2w2−i−1 + w+i−1 + w++i−1 + t+i−1 − t−i−1 ∈ D1
Computation: si−1 = ti−1 + wi−1
Output: si−1 = −4s−i−1 + 2s2+i−1 + s+i−1 , si−1 ∈ D1
The on-line delay is δAvizienis = 1. The input and output represen-
tation are the same. This allows the easy interconnection of several of
these Avizienis adders.
5.2.2.2 Forest Serial Adder
In the Forest adder, the inputs as well as the outputs use Forest repre-
sentation. This adder was introduced in [For97] and the derivation of
the cell equations can be found therein. The adder is shown in Fig. 5.3.
66 Chapter 5. The Choice of the Radix
It can also be decomposed in two elementary cells. The equations are
given by:
ai− − bi++bi  +
wi  2− wi  −1 wi
++
a i b

i
Fore st 1
ai  + ai++ ai
  −
Re g Re g Re g
bi−1 bi−1wi  −1 wi  −1 wi  −1
Fore st 2
2− + + +
si−1 si−1 si−2
si−2
Re g
− − −
Re gRe g Re g
si−1
+ + + − − −
+
bi−−bi −
Figure 5.3: Serial Forest adder
Entries:
{
ai = a+i + a
++
i − a−i − a−−i , ai ∈ D2
bi = b+i + b
++
i − b−i − b−−i , bi ∈ D2
The first cell (Forest 1 in Fig. 5.3) is described by:
Entries: a+i + a
++
i − a−i − a−−i + b+i + b++i ∈ {−2,−1, 0, 1, 2,
3, 4}
Computation: a+i + a
++
i − a−i − a−−i + b+i + b++i = wi
Outputs: wi = −2w2−i + 4w+i−1 + w++i ∈ {−2,−1, 0, 1, 2, 3, 4}
The second cell (Forest 2 in Fig. 5.3) is described by:
Entries: −2w2−i−1 + w+i−1 + w++i−1 − b−i−1 − b−−i−1 ∈ {−4,−3,
−2,−1, 0, 1, 2}
Computation: −2w2−i−1 + w+i−1 + w++i−1 − b−i−1 − b−−i−1 = sˆ
Outputs: sˆ = s+i−1 + s
++
i−1 − 4s−i−2 − s−−i−1
5.2. Implementation of Radix 4 Adders 67
After Registers: si−2 = s+i−2 + s
++
i−2 − s−i−1 − s−−i−1 ∈ D2
The cell computations are more complicated than for the Avizienis
adder. However, as shown in the next section, this does not lead to a
larger realization. The on-line delay of the Forest adder is δForest = 2.
5.2.2.3 Hybrid Serial Adder
This adder has the particularity that the operands have different repre-
sentations, namely one has the Montalvo representation and the other
the differential representation. After a conversion step, the result will
also be in differential representation. Building complex algorithms is
much more difficult with this architecture where the representations are
not uniform. However, as shown in the next section, the size of the
realization outperforms the two other adders. The hybrid adder can be
decomposed into 4 standard ppm or fa cells as shown in Fig. 5.4. In
addition the conversion block of Fig. 5.5, it is necessary to re-convert
the unusual output representation of the hybrid adder back to the dif-
ferential representation.
ti−1 ti
2−
ai, bi
ppm   1
bi2+ ai2− ci2+
Re g
ri−2
Re g
Re g
++ c i2+ ti +
fa  1
ai + ai++ bi +
ri−2 ri−1
mm
  p 1
bi−1 ti−1 di−1
di−1 ri−1
ppm   2
ti−1 bi−1 ti−1
2+
Re g
Re g
Re g
bi2− bi
−
2− 2− 2+
2

+ − −
+ +
−
−+
Figure 5.4: Serial hybrid adder without output conversion
68 Chapter 5. The Choice of the Radix
ri−2
ri−2 ri−2 ri−2
2  + − −−
si−2 si−2 si−2
2
 
−
+ ++
Conve rsion
si−2
Figure 5.5: Output conversion for hybrid adder
The cell equations are derived from the ppm cells already shown in
Chap. 2:
Entries:
{
ai = −2a2−i + a+i + a++i , ai ∈ D2
bi = 2b2+i + b
+
i − 2b2−i − b−i , bi ∈ D1
Computation: 2b2+i − 2a2−i + 2c2+i = 4t++i−1 − 2t2−i (ppm1)
a+i + a
++
i + b
+
i = 2c
2+
i + t
+
i (fa1)
−2b2−i−1 − 2t2−i−1 + 2d2+i−1 = −4r−i−2 + 2r2+i−1 (mmp)
t+i−1 − b−i−1 + t++i−1 = 2d2+i−1 − r−−i−1 (ppm2)
Conversion: 2r2+i−2 − r−i−2 − r−−i−2 = −2s2−i−2 + s+i−2 + s++i−2
Output: si−2 = −2s2−i−2 + s+i−2 + s++i−2 , si−2 ∈ D2
The on-line delay of the hybrid adder is δHybrid = 2. Due to the
difficulties related with the different representations of inputs and out-
puts, this adder is almost exclusively used as part of other operators
like for example divisions [Mon95].
5.2.3 Comparison of Radix 4 Adders
The different cells of every single adder were described in the form
of lookup tables and synthesized for Actel FPGAs. The optimization
objective was to minimize the circuit size (counted in number of prim-
itives). A comparison of the circuit characteristics obtained is listed in
Tab. 5.1. In the last column the characteristics of the radix 2 on-line
adder (R2) are given.
5.3. Suitability for Real-Time Control 69
Name of Adder Avizienis Forest Hybrid R2
Components Avizienis1, Forest1, 3×ppm,fa, ppm,
Avizienis2, Forest2, Convert, mmp,
3×Register 7×Register 6×Register 3×R
Size (cells) 41 + 3 = 44 26 + 7 = 33 9 + 6 = 15 5
On-Line Delay 1 2 2 2
Critical Period 11cells=ˆ55ns 8cells=ˆ40ns 4cells=ˆ20ns 10ns
Input Bits 6 8 7 4
Output Bits 3 4 3 2
Table 5.1: Implementation characteristics of radix 4 adders, in the last
column the characteristics of the radix 2 on-line adder (R2) are given
for comparison
5.3 Suitability for Real-Time Control
Table 5.1 clearly indicates the influence of input and output representa-
tions on the size and on-line delay of radix 4 adders. In the case where
a standard radix 2 converter is used (Fig. 5.1 top) only the Avizienis
adder is as fast as the radix 2 implementation (Eqn. 5.1). However, the
necessary circuit size is larger than in radix 2 arithmetic by a factor of 9
(last column of Table 5.1). As the power consumption is a linear func-
tion of clock frequency and gate number, i.e. P = const. × f × nGates,
it will be much higher for radix 4 even at half clock frequency (more
than a factor 4 in case of the adders). Therefore, the combination of a
standard radix 2 A/D converter and the on-line arithmetic computation
in radix 4 is usually avoided.
In the case where both A/D conversions and computations are per-
formed in radix 4, the situation is different. Corresponding to Eqn. 5.2
all the adders presented are faster than in radix 2. However, only the
Avizienis and Forest adders are convenient for a modular design. The
Hybrid adder needs additional conversions and a different normalization
algorithm which has a non-zero delay [For97]. The speed improvement
is δimprove = δn + δ2 − δ4 clock cycles, which can be important for
complex algorithms (large δ2 − δ4) with high resolution (large δn). The
decision for radix 2 or a higher radix is dependent on the complexity of
the algorithm and computation time requirements. For most controllers
70 Chapter 5. The Choice of the Radix
the speed of a radix 2 implementation is sufficient. This claim is based
on the following reflection:
As shown in Sect. 6.2, the shortest necessary sample periods for elec-
tromechanical systems are around h = 25µs (=ˆ40 kHz = fs). Assum-
ing we have a standard A/D converter (T2 = 0.5µs), this leads to
50 clock cycles/h. For values of the conversion delays in the interval
δA/D + δD/A = 3 and a resolution δn in the range of 16 bit to 24 bit, a
maximum on-line delay of the controller of 23 ≤ δarithmetic ≤ 31 is ob-
tained. A usual controller has a much lower value and can thus normally
be computed in radix 2. However, for cases where the same controller is
reused several times (see Sect. 4.6), for extremely fast processes or for
controllers including non-linear operations, a radix 4 implementation
can still be necessary.
The results shown cannot be extended directly to algorithms includ-
ing multiplications because these are operations which grow in size with
the number of digits of the operands. In a radix 4 multiplier only half
of the digit slices are needed. A detailed comparison for multipliers was
made by Arnaud Tisserand et al. It was shown that a radix 4 multiplier
is only about 1.5 times larger than the corresponding operator in radix
2 and the multipliers occupy the majority of the circuit. This means
that despite the much higher ratio for radix 4 adders the overall circuit
size can still be kept lower than twice the size of the radix 2 imple-
mentation. Therefore, in size the radix 4 operators are always larger,
but a gain in power consumption is still possible (only half the clock
frequency in radix 4 than in radix 2). These results were developed for
ASICs where the optimization possibilities go down to transistor level.
We conclude that radix 2 operators should be preferred when min-
imum size is the main objective. In certain cases computation time
constraints or low power requirements can force a radix 4 implementa-
tion, but a sacrifice in hardware size has to be accepted.
Chapter 6
Comparison to
Classical Solutions
In this chapter on-line arithmetic implementations are compared to op-
timized digit-parallel solutions. Global statements for these kinds of
comparisons are difficult to make because the implementation complex-
ity and therewith speed, size and power consumption are very depen-
dent on the specific control algorithm. However, certain guidelines can
be provided which emphasize for what kind of applications on-line arith-
metic offers advantages.
In the first section the digit-parallel architectures which are used
for the comparison will be specified. Then indications for the sampling
time requirements and therewith the computation speed requirements
are discussed. These constraints are then used to develop statements
about speed, size and power consumption of on-line arithmetic imple-
mentations with respect to digit-parallel solutions.
6.1 Architectures Compared
Three architectures are compared in this chapter. The first one is the on-
line arithmetic processing scheme introduced in detail in Chap. 3. The
second is an optimized version of the common von Neumann architec-
ture based on digit-parallel operators. The third architecture discussed
is optimized for small calculation times by using several digit-parallel
operators simultaneously. In this section, the two digit-parallel archi-
71
72 Chapter 6. Comparison to Classical Solutions
tectures are described in detail.
6.1.1 Sequential digit-parallel calculation scheme
In order to provide a fair comparison of on-line arithmetic with digit-
parallel arithmetic, first a calculation scheme which is optimized with
respect to size will be introduced. Its architecture is of von Neumann
type as in most DSP or micro-controllers, i.e. that the instructions are
listed in a memory and there is a fixed number of operators and registers
which can be connected to different buses via tristate bus drivers. The
instructions are sequentially decoded and the information obtained is
used to control the switching of the bus drivers.
As long as the computation time constraints are not too strong, every
different operation is provided only once. With stricter time constraints
parallel execution of several operations can become necessary. This
not only increases the space required for the operators themselves, but
also the space of the flow control functions, i.e. additional registers,
additional or wider busses, larger decoders. It is assumed that the
whole structure (number of operators and registers, decoder size and so
on) is always optimized for the specific real-time controller as for the
on-line arithmetic implementation.
In order to keep the illustration of the sequential structure simple,
the PID controller from Sect. 4.2 is chosen as an illustrative example.
The state-space selected representation is slightly different from the one
shown there:
zk+1 = Azk +Bek−1 (6.1)
uk = Czk +Dek−1
ek = sk − yk
A =
(
a11 1
0 1
)
C =
(
1 0
)
B =
(
b1
b2
)
D = d
The on-line arithmetic solution is given by the scheme in Fig. 6.1.
It computes states and output values in parallel.
In sequential digit-parallel arithmetic (scheme in Fig. 6.2), first the
new output value is computed and subsequently the state updates are
made. For the repetitive computation of the PID algorithm, 9 instruc-
6.1. Architectures Compared 73
uk+1
a1  1
b1 +

+
ol-FIF O n
ek 1
d
b2
+

z1,k +1
z2,k +1
δ =  3
δ =  3
δ =  3
z1

,k
ek
ol-FIF
 O n 1
1
−
δ =  2sk
y	 k re
 g
δ =  1
z2 ,k
ek
Figure 6.1: Implementation scheme of the PID example,
control lines are not indicated
tions and 10 registers are necessary. The instructions are given in an
assembler like language in Tab. 6.1. The constants are thereby assumed
to be loaded into registers (Reg7–Reg10) in an initialization phase. This
initialization is not indicated in the controller computation.
No Operation Operands Computation
1 sub s,y,Reg2 ek = sk − yk
2 mult Reg2,d,Reg1 d ek
3 add Reg1,Reg5,Reg6 d ek + z1,k = uk+1
4 mult Reg2,b2,Reg1 ekb2
5 mult Reg2,b1,Reg3 ekb1
6 add Reg4,Reg3,Reg3 ekb1 + z2,k
7 add Reg1,Reg4,Reg4 ekb2 + z2,k = z2,k+1
8 mult Reg5,a11,Reg1 a11z1,k
9 add Reg1,Reg3,Reg5 a11z1,k + ekb1 + z2,k = z1,k+1
Table 6.1: Instructions for PID controller implemented in the scheme
of Fig. 6.2
74 Chapter 6. Comparison to Classical Solutions
Address Decoders 
(Op1, Op2, Res)
Instruction
Decoder
+ /
 
 − Reg 1 Re g 2 Reg  10...
n
n
n
Inp uts
Controller
Output
For Constants
Program Memory
1
Token
Mechanism
0
0
0
0
0
Bus  A
Bu s B
Bus C
Ins tr Ad	 d1 Ad	 d2 Ad	 d3
m
 v mu lt ad	 d
su b
To Tristate Bus 
Drivers
mv  sub  a dd  mult
tristate bu s driver
Re g Register w ith enable0
0
0
Figure 6.2: Circuit structure for sequential digit-parallel arithmetic
6.2. Sampling Time Requirements of Microsystems 75
6.1.2 Full-parallel digit-parallel calculation scheme
The second calculation scheme is optimized with respect to speed. It is
very similar to the on-line arithmetic scheme, i.e. every operation of the
algorithm is realized by a separate operator, but the single operators are
realized in digit-parallel arithmetic. Usually the operators are arranged
in groups which can be executed in parallel. The intermediate results are
stored in registers. In the case of the PID example, the only operations
are multiplications and additions. These operations can be realized
efficiently in combinatorial operators and thus the intermediate registers
can be avoided by using one large combinatorial operator for the whole
algorithm. This case is illustrated in Fig. 6.3. The necessary clock speed
is equal to the sampling frequency because only one result is produced
per sampling period.
uk  +1
a 1 1
b 1
+
+
reg 
ek
d
b 2
+
z 1
	
,k+1
z 2
 ,k+1
z 1
	
,k
ek 
s k 
y k 
z 2,k
ek 
reg 
+
−
Figure 6.3: Circuit structure for the full-parallel implementation
6.2 Sampling Time Requirements of
Microsystems
The selection of the best sampling frequency for a digital control system
is a trade-off between cost and performance. Generally, the performance
of a digital controller improves with increasing sampling frequency up to
a certain limit, but faster sampling also increases the cost of the individ-
ual components (especially A/D converters). Often also higher precision
is required in fast sampling designs because the controller equations be-
76 Chapter 6. Comparison to Classical Solutions
come numerically sensitive. On the other hand, some digital controllers
are designed in a quasi-continuous manner, i.e. the design methods as-
sume a highly oversampled design (≥ 40× closed loop bandwidth).
In this section, the limits and the influence parameters of the sam-
pling frequency are discussed and some hints are given concerning the
maximum value expected for microsystems. These indications will serve
as computation time constraints for the evaluation of the three proposed
architectures.
An absolute lower bound for the sampling frequency is set by the
specification to track a certain reference input signal. This bound is
given by the sampling theorem. If a system is supposed to track a
reference input up to a certain closed-loop bandwidth fb, the sampling
frequency fs must be at least twice this value:
fs
fb
> 2 (6.2)
Eqn. 6.2 provides the fundamental lower bound on the sampling
frequency. In practice, however, this theoretical lower bound would be
judged far too slow for an acceptable time response. Often a higher
factor is chosen to provide some smoothness in the response and to
limit the magnitude of the control steps. In general, a factor between 6
and 40 is selected:
6 ≤ fs
fb
≤ 40 (6.3)
In Fig. 6.4 an example is shown where a double integrator (satellite
attitude control problem) is stabilized by a constant state feedback.
The sampling frequency is thereby chosen to different multiples of the
system bandwidth obtained. For a low value of fs/fb one of the states
(the acceleration) has large discontinuities and would have a strong
tendency to excite any flexible modes and produce high stresses in the
actuator and its attached parts. The degree of smoothness required in a
particular design depends on the application for example specifications
of power supply peak voltage and current, acceptable stress levels with
respect to life time or exportation of reaction forces, respectively.
In addition to the smoothness issue, it is often important to reduce
the delay between a change of the reference signal and the related sys-
tem response. A reference input can occur at any time throughout a
sampling period. Therefore, there can be a delay of up to a full sampling
period before the controller reacts. A rule suggests (see [FPW98]) to
6.2. Sampling Time Requirements of Microsystems 77
0 2 4  6
-0.5
0
0.5
1
1.5
System Sampled at f   = 4 fb
f b t
Co
n
tr
o
l V
al
u
e 
(x)
,
 
St
at
es
 
(∗,
o
)
0 2 4  6
-0.5
0
0.5
1
1.5
System Sampled at f  = 8 fb
f b t
Co
n
tr
o
l V
al
u
e 
(x)
,
 
St
at
es
 
(∗,
o
)
0 2 4 6
-0.5
0
0.5
1
1.5
2
S ystem Sampled at f  = 20 fb
f b t
Co
n
tr
o
l V
al
u
e 
(x)
,
 
St
at
es
 
(∗,
o
)
0 2 4 6
-0.5
0
0.5
1
1.5
2
S ystem Sampled at f  = 40 fb
f b t
Co
n
tr
o
l V
al
u
e 
(x)
,
 
St
at
es
 
(∗,
o
)
s s
s s
Figure 6.4: Step responses of a controlled double integrator at different
sampling Frequencies
keep the time delay to 10% of the rise time (tr = 1.8/fb), which leads
to a demand of:
fs
fb
≥ 20 (6.4)
A third aspect which motivates a high sampling period is the distur-
bance rejection. Disturbances enter a system with various characteris-
tics ranging from steps to white noise. For the purpose of determination
of sampling frequency, the higher frequency random disturbances are the
most influential. They are fast compared to the plant and the sampling
frequency and can be considered to be white. In [FPW98] it is shown
that up to a factor of 20 disturbance rejection improves significantly
and it is almost constant above. This leads to the same demand as
in Eqn. 6.4. Quantification errors due to limited word length have a
78 Chapter 6. Comparison to Classical Solutions
negative influence on disturbance rejection.
The criteria mentioned above can be summarized by concluding that
in most of the cases, a factor of fs/fb = 20 is sufficient. In some special
cases (low-order anti-aliasing filter, high demands on smoothness) a
slightly higher value can become necessary (up to factor 40).
Microsystems are generally fast because of their small inertia. Nev-
ertheless, the attainable bandwidths of mechanical systems are limited
by the sensitivity to unmodeled dynamics. In most cases the closed-
loop bandwidth can’t be pushed higher than a factor 10 of the open-
loop bandwidth. Today, this corresponds to a maximum value of the
closed-loop bandwidth of about 2 kHz. In special cases current loop can
demand twice the value (around 4 kHz). With the factor of 20 between
closed-loop bandwidth and sampling this leads to sampling frequencies
in the range of:
fs,max = 40 . . . 80 kHz (6.5)
That means that all controller calculations have to be repeated every
13 . . . 25µs, including A/D and D/A conversions.
6.3 Speed, Size, and Power Consumption
In this section the three different design methods (on-line arithmetic,
sequential and full-parallel digit-parallel arithmetic) are compared with
respect to speed, size and power consumption. The qualitative results
will help to judge if an on-line arithmetic implementation for a planned
controller is advantageous or if another implementation method should
be chosen.
6.3.1 Speed
The specified maximum sampling frequency (see Eqn. 6.5) is nowadays
attainable by all three design methods. For the digit-parallel designs
this is owing to the fact that the clock frequency of the arithmetic
is independent of the A/D conversion speed and it can thus be made
sufficiently fast.
With a typical conversion time of 10µs (for a 16-bit converter) there
are still 3 . . . 15µs left for the full parallel solution. Operators can be
clocked at several MHz and therefore even complex controllers can be
computed sufficiently fast. In the sequential digit-parallel case the con-
6.3. Speed, Size, and Power Consumption 79
troller can be formulated in a way that most complexity is in the state
equations and only few operations are necessary for the output equa-
tions. This makes almost the whole sampling period available for com-
putation. With a clock frequency of several MHz, there is enough time
to finish the controller computation in one sampling period.
For on-line arithmetic the situation is similar to the sequential case.
The controller complexity can be shared between state and output equa-
tions, which keeps the individual operator delays low. Taking the timing
scheme of Fig. 3.5, the A/D sampler delay (δA/D) and D/A delay (δD/A)
are together typically about 2µs. With a typical clock speed of the A/D
conversion of 2 MHz and a resolution of 16 bit, this leads to a maximum
delay in the forward path of the calculation (δarithmetic) of:
δarithmetic = δperiod − δD/A − δA/D − δn = 6 . . . 30 (6.6)
where δarithmetic, δperiod, δA/D, δA/D, δn are the maximum delay of
the output equation, the number of clock cycles per sampling period
(26 . . . 50), the D/A converter delay (1), the sampler delay (3), and
the number resolution (16), respectively. The value of 6 . . . 30 for the
maximum delay of the output equation leads to a value in the range
of 9 . . . 33 for the maximum delay of an operator in the state loops
(δarithmetic + δD/A + δA/D − 1). For a multi-adder this value corre-
sponds already to an operator of more than 200 entries which is greater
than any controller used in common applications. For cases with higher
requirements on computation speed, in the on-line arithmetic scheme
the controller computation and the A/D conversion can be separated
and a faster clock can be used for computation.
The arguments given show that a decision based on the computation
time as the only criteria doesn’t favor any of the designs presented. All
of them can handle a sampling period in the range of 13 . . . 25µs (see
Eqn. 6.5). However, the effort necessary in terms of circuit size and
power consumption is very different. These two criteria are the subjects
of the next two subsections.
6.3.2 Circuit Size
It is obvious that the circuit size of the full-parallel solution is far larger
than the two other designs. In comparison to the sequential scheme
much more operators are used and in comparison to on-line arithmetic
the individual operators are much larger. A 16-bit multiplier, for in-
80 Chapter 6. Comparison to Classical Solutions
stance, is 6 times larger in full-parallel arithmetic than in on-line arith-
metic.
The comparison between sequential digit-parallel arithmetic and on-
line arithmetic is more complicated. In the sequential design not only
the operator size has to be taken into account, but also all the flow con-
trol part including memory, decoders, bus drivers and registers. The
modular design of the on-line arithmetic operators avoid these by a
distributed control scheme. However, with every additional operation
in the control algorithm, a new on-line arithmetic operator has to be
added. In the sequential scheme a new operator is forced only when the
sequence of the existing operators becomes too slow for the chosen sam-
pling period. Non-linear operations, like divisions or square roots, play
an important role herein. Their digit-parallel operators are large and
their computation time is comparable to on-line arithmetic operators
(e.g. O(n) for division instead of O(log2 n) for multiplication), but in
on-line arithmetic other operations are executed in parallel with them
(digit and instruction pipeline). Therefore, multiple copies of the same
digit-parallel operators are already forced much earlier. This situation
is illustrated in Fig. 6.5 where the circuit size is plotted over the number
of individual operations of the control algorithm.
The slope of the on-line arithmetic graph is given by the continu-
ous increase of the number of operators. In the graph of the sequential
scheme the small slope indicates the increase in necessary flow con-
trol logic. As soon as the computation time limit is reached and an
additional operator becomes necessary, an important increase in gate
number appears. This is indicated by the steps of the sequential graph.
The second curve for the sequential scheme indicates that for algorithms
containing non-linear operations, the critical limit where both designs,
on-line arithmetic and digit-parallel arithmetic, have the same size, is
reached much later. As will be seen in Sect. 7.2, the critical limit for
controllers containing only multiplications and additions (with a resolu-
tion of 16 bit) is about 12 multiplications. All single on-line arithmetic
controllers smaller than this number or the ones including non-linear
operations are advantageous with respect to size. The two application
examples (PID and Piezo demonstrator, presented in Chap. 7), are in-
dicated in the graph.
The reflections show that, with respect to size, on-line arithmetic
implementations of linear controller algorithms are only advantageous
for small to medium size designs (12–15 multiplications). Non-linear
operations force a higher number of individual operators in the digit-
6.3. Speed, Size, and Power Consumption 81
on
-
line
 
arit
hm
etic
Number of Operations 
/Sampling period
Circuit 
Size
sequential digit-parallel
(only linear operations)
sequential digit-parallel
(also non-linear operations)
PID 
demonstrator
Piezo
demonstrator
Figure 6.5: Circuit size with regard to controller complexity
parallel circuit and thus shift this limit to higher values.
The main reason for this result is that control computation time
constraints are not strong enough. In other signal processing problems
with higher requirements on computation time, like digital filtering or
feed-forward processing of important data streams, operator copies are
already forced for a smaller number of operations and the size relation
would be different.
6.3.3 Power Consumption
In contrast to speed and size, power consumption aspects are mainly
addressed on ASICs. On FPGAs, many gates are clocked by design.
Additionally, the long connections, due to the general purpose structure,
make a large contribution to the high average power consumption.
The reasoning for low power designs, presented in [TMP99] for the
field programmable on-line operator (FPOP) project, was based on the
82 Chapter 6. Comparison to Classical Solutions
objective of high throughput. The goal of this project was to develop a
low-power high-throughput reconfigurable circuit based on on-line arith-
metic operators. High throughput in this context means that the digit-
pipeline in the circuit is kept as busy as possible. This implies new
inputs to the circuits before the results of the previous calculation reach
the output. However, in controller implementations it is mainly the ab-
solute computation time which is of most interest. This was illustrated
by the timing diagram of Fig. 3.5. In order to respect correct timing,
the old control value is sent to the physical system before a new sample
is taken. Therefore, the reasoning for the low power consumption of
on-line operators is different in this case.
The central relation describing the average power consumption of a
circuit is given by:
P = aCV 2f (6.7)
where P , a, C, V , f are the average power consumption, a constant,
the circuit capacitance, the supply voltage and the clock frequency, re-
spectively. Based on this formula, the three design methods presented
are compared. The minimum supply voltage is thereby a function of
clock frequency and granularity (largest combinatoric block) of individ-
ual operators. The clock frequency is mainly given by the complexity
of the control algorithm and the design method chosen.
Sequential digit-parallel design (sp)
In a sequential design, all operators in the ALU are always clocked and
the execution consists of changing their interconnection with the bus.
It is important to remark that in this design method the additional
memory and the parallel busses make an important contribution to the
circuit capacitance. The required clock frequency for sequential execu-
tion is much larger than for the full-parallel design. Therefore, even
the smaller granularity doesn’t allow a lower supply voltage than in the
full-parallel design. The remarks given can be summarized as:
Csp = Cconnections + Coperators (6.8)
Vsp = V (f, gr) = Vsp = Vfp
fsp > ffp
6.3. Speed, Size, and Power Consumption 83
Full-parallel design (fp)
In the full-parallel design, the operator capacity is much higher (factor
m, m  1) because every single operation of the control algorithm is
realized by a digit-parallel operator. However, short direct connections
of the individual operators are used and busses are avoided. Therefore,
the contribution of the connections is smaller (factor 1/n, n > 1). As
mentioned above, the clock frequency is much smaller because of the
parallelism in the execution. The supply voltage is about the same as
in the sequential case. The results can be summarized as:
Cfp =
1
n
Cconnections +mCoperators (6.9)
Vfp = Vsp
ffp < fsp
On-line arithmetic (ol)
In on-line arithmetic the serial computation leads to a smaller number
of connections (additional factor p for the connection capacity, p >
1) than in the other two cases. The operator size is, as discussed in
Sect. 6.3.2, up to a certain controller complexity smaller than the digit-
parallel designs. The serial computations also lead to a small granularity
and therefore, the supply voltage can be decreased by a factor of 1/o,
o > 1. The clock frequency is often smaller than in the digit-parallel
cases because of the digit-pipeline (overlap of successive operations).
These criteria can be summarized as:
Col =
1
p · nCconnections + Con−line operators (6.10)
Vol =
1
o
Vsp
fol ≤ fsp
From the Eqns. 6.7, 6.8, 6.9 and 6.10 it follows that on-line arith-
metic is advantageous with respect to power consumption far beyond
the limit where sequential digit-parallel designs and on-line arithmetic
designs have the same size. This is mainly due to the quadratic influence
of the reduced supply voltage and therefore due to the small granularity
of the on-line arithmetic computation.
Note that LSDF arithmetic behaves very similarly to on-line arith-
metic for linear operations. The gain in operator size is compensated by
84 Chapter 6. Comparison to Classical Solutions
additional registers for synchronization purposes (extra digits in multi-
plications, see Sect. 1.1) and by a higher clock speed in the LSDF case
(more wait cycles).
Note that speed, size and power consumption are not the only crite-
ria influencing the choice of a particular design. The number of connec-
tions for the external interface can also be an important issue. For this
point, on-line arithmetic is advantageous in comparison to the digit-
parallel designs because of its serial interface. This is even more im-
portant with respect to successive approximation A/D converters which
normally work serially. They can be added directly to the digit-pipeline.
An external serial interface can also be easily extended (e.g. for multiple
inputs multiple outputs (MIMO) systems) because connection band-
width is less critical. Another important point is the simplicity of the
design process. Suppose that there is an on-line arithmetic library avail-
able, then constructing a controller by connecting modular operators is
very simple whereas the construction of a bus structure demands much
more insight into circuit architectures.
Chapter 7
Applications
Two control algorithms were implemented in on-line arithmetic. The
first example, a numerical PID controller for a space application, was
chosen to show the size advantage of on-line arithmetic operators. The
number of individual operations in this example is so low that the mul-
tiple operator copies for the on-line arithmetic scheme are still much
smaller than the arithmetic unit in the digit-parallel case. The sec-
ond example treats a two-degrees-of-freedom controller for a piezo sys-
tem. In this case many more individual operations have to be exe-
cuted. However, the computation time constraint is not strong enough
to force multiple operator copies in the digit-parallel case. This leads
to a disadvantageous situation for the on-line arithmetic implementa-
tion. But nevertheless the on-line arithmetic solution outperforms the
digit-parallel one concerning circuit size and number of necessary clock
cycles.
For both examples a short description of the application consid-
ered is given and the controller equations are presented in the original
and transformed representation. The hardware requirements are quite
different. For the space application one-time programmable Actel FP-
GAs (anti-fuse technology) are used. It is the only kind of gate array
which has space certifications. On the contrary, the piezo system uses
a re-programmable development system. Both hardware platforms are
described in detail and comparisons to digit-parallel implementations
are given.
85
86 Chapter 7. Applications
7.1 PID-Demonstrator
As a first benchmark example, the standard PID (proportional-integral-
differential) algorithm was chosen. It is still the most frequently used
algorithm in today’s industrial applications. In space applications it
is often used as a current or position controller. Here, many of the
implementations are still analog because of the high dynamic of each
loop (sampling time requirements dependent on system characteristics
up to 40 kHz). In this section it will be shown that on-line arithmetic
allows a very efficient implementation of this algorithm and that the
computation time constraints can easily be met even with slow system
clocks (sampling time of 25µs with 760 kHz oscillator).
7.1.1 Controller Representation
For the PID algorithm, a version with filtered differential part was cho-
sen. The controller equations are given as follows:
uk = K
(
ek−1 +
h
TI
xi,k−1 + TDxd,k−1
)
(7.1)
with xi,k−1 = ek−1 + xi,k−2
xd,k−1 = γ (ek−1 − ek−2 + τxd,k−2)
ek = sk − yk
where ek = sk − yk is the difference between set-point sk and measure-
ment yk, xi,k and xd,k the controller states, uk the controller output,
h the sampling period, τ the time constant of the differential part fil-
ter, γ = 1/(h + τ) and K, TI , TD are the proportional, integral and
differential gains, respectively.
As already mentioned in Sect. 4.2, the controller equations Eqn. (7.1)
are not in an appropriate form for an implementation in on-line arith-
metic. The combination of state update and controller output computa-
tion leads to a large delay and to scaling problems for the intermediate
results. With the transformation to a special state space representation
(Jordan form) the complexity can be put in the time uncritical state
update loops while simplifying the output equation. The new controller
equations are given by:
zk+1 = Azk +Bek−1 (7.2)
7.1. PID-Demonstrator 87
uk = Czk +Dek−1
A =
(
τγ 0
0 1
)
C =
(
1 1
)
B =
( −TDγ(1− τγ)
h
TI
)
K
D =
(
1 + hTI + TDγ
)
K
After simulation of some physical processes which are to be con-
trolled by the demonstrator, the ranges of the 4 controller parameters
were chosen as indicated in Eqn. 7.3. Two operator resolutions were
tested, first a 12-bit value and then a 24-bit value.
0 ≤ a11 < 1 (7.3)
−0.5 ≤ b1 < 0.5
0 ≤ b2 < 1
0 ≤ d < 128
7.1.2 On-Line Arithmetic Computation Scheme
The resulting operational scheme for our PID example is shown in
Fig. 7.1 for a n-bit resolution. It includes 2 inner loops with appro-
priate registers and 3 Vector×Vector operators with simplified final
adders. Every operator was realized as a modular on-line arithmetic
operator. That means that the values z1, z2, and z3 are the result of a
normalization algorithm and that the operator reset is realized by one
initialization operator for each operation. For better representation,
these flow control blocks were omitted in Fig. 7.1.
 uk+1   to 
FIFO D/Ab2
 +

+
 ol-FI FO
ek
d
a1 1
+

z1(k

+2)
z2(k

+2)
δ =  3δ =

 3
δ =  3
z1(k

+1)
ol-FI FO
−
δ =  2s
	
k
y
 k re g
δ =  1
z2(k

+1)
b1
Figure 7.1: Implementation scheme of the PID demonstrator,
control lines are not indicated
88 Chapter 7. Applications
In order to reduce the length of the critical combinatorial paths
between the entry of the subtraction and the output of the multipliers,
the inputs of the multipliers are delayed by one clock cycle. This reduces
the period τ . Due to the choice of reasonable controller parameters
and an analog output amplifier, the on-line delay of the controller is
δarithmetic = 6. For the converters chosen the times τ × δA/D = 1µs
and τ × δD/A = 0.4µs are given. The sampling time was fixed to a
reasonable value of 25µs. Resulting clock frequencies and number of
intermediate zeros are listed in Tab. 7.1.
Criteria (TSampling = 25µs) 12 bit 24 bit
Intermediate Zeros δzeros = 7 δzeros = 8
Clock Speed 760 kHz 1.3 MHz
Table 7.1: Clock Speed and Intermediate Zeros for PID Demonstrator
7.1.3 Hardware Implementation
The PID algorithm mentioned above was implemented together with the
process interface as well as a PC interface on a BICC-VERO prototyping
board for IBM PC AT computers. It holds 2 Actel FPGAs (1× 1240A,
1×1280A), some logic for a PC bus interface as well as 2 A/D and 2 D/A
converters. Different controllers can be implemented by exchanging
the FPGAs and the monitoring of run time values is provided by the
PC bus interface. In order to verify the design rules, the PID scheme
shown above (Fig. 7.1) with resolutions equal to 12 bit and 24 bit was
implemented in the Actel 1280A device (a low cost FPGA in anti-fuse
technology, for specification details see [Act96]).
The necessary logic for the PC bus interface as well as the process
interface for the on-line arithmetic controller were entirely realized in
the smaller Actel 1240A FPGA. This was mainly for demonstration
purposes in order to separate the digital controller from all interfaces.
Only 18% of the circuit are used. For the final solution the process in-
terface could be included in the controller FPGA. The interface realized
is illustrated schematically in Fig. 7.2. In a structure like this one the
on-line arithmetic controller can easily be exchanged and the execution
control is reduced to the appropriate generation of the control line.
7.1. PID-Demonstrator 89
Controller Interface
Cou  nter
Contr oller
Clock Ge nerator
s
y
clo ck
ctr_in
u
res et
ctr_out
A/ D
A/ D
D/ A
D/ A
Host Co mputer
Actel 1	 240A
Actel 1	 280A
Figure 7.2: Unified controller interface in the PID demonstrator
7.1.4 Controller Performance
The on-line design was compared to a sequential realization in digit-
parallel arithmetic and a full parallel solution (see Table 7.2). For the
digit-parallel operators, a carry-look-ahead adder and a Wooley [Mul89]
multiplier were used.
Parallel Arithmetic
Sequential Full-Parallel On-Line
12 bit δzeros = 7
Clock Speed 640 kHz 40 kHz 760 kHz
Actel Cells 1450 (118%) 2600 (212%) 645 (54% )
24 bit δzeros = 8
Clock Speed 1.2 MHz 40 kHz 1.3 MHz
Actel Cells 3700 (300%) 9200 (749%) 1100 (89%)
Table 7.2: Size and Clock Speed of PID Implementation
Only u is computed immediately after the A/D conversion. The
state updates of z1 and z2 are carried out in parallel with the following
conversion (see Fig. 7.3).
Owing to the simplicity of the PID controller, the clock frequencies
are comparable for the on-line arithmetic and sequential approaches.
90 Chapter 7. Applications
Sampling
Instant
A/ D Co  nversion
δA / D Convers ion Unit
δArith metic δn
δD /A
tim e/τ
Output Terms
Sampling
Instant
Digit-Para l
	 lel Comp.
On-Line A
 rithmetic
δD /A
δzerosIntermedi a te Zeros
State U pdate
δper iod
Figure 7.3: Timing of digit-parallel and on-line solutions
However, the on-line solutions are much smaller and fit on one single
FPGA even for 24-bit resolution. The full parallel solutions are much
larger as expected and their low clock speed (equal to sampling fre-
quency) is due to the combinatorial nature of the operators chosen.
The power consumption could not be evaluated explicitly (FPGA
implementation), but it would certainly be higher in the parallel cases
(more gates and bus traffic, high number of transitional states).
The size advantages of the on-line arithmetic implementation in the
PID example are due to the much smaller size of on-line arithmetic
individual operators. This result can be extended to applications where
several PID controllers are used in parallel in one system (e.g. controllers
for multiple axes).
In cases where very fast sampling is required (in the range of sev-
eral microseconds) and commercially available components should be
used, the sequential A/D converters are often too slow and thus they
are replaced by fast parallel converters. When this is done, the arith-
metic can be clocked at much higher speed because the limiting A/D
converters are no longer part of the digit pipeline. In these cases, the
A/D conversion is usually done before the D/A conversion of the last
control value. That means that the timing of Fig. 7.3 is not respected.
The resulting solution is suboptimal but can be appropriate for some
special applications (e.g. very fast current loops).
Such a solution was also implemented in the PID demonstrator with
2 parallel A/D converters of 12-bit resolution. The sampling frequency,
which was obtained with a 10 MHz oscillator, was 435 kHz=ˆ2.3µs. The
gate numbers are exactly the same as indicated in Tab. 7.2 because
7.2. Piezo Tip–Tilt Mirror 91
the implementation uses the same algorithm and the host interface has
about the same complexity. In practice such a fast sampling is usually
avoided because it causes numerical problems even when the controller
is applied to fast current loops.
7.2 Piezo Tip–Tilt Mirror
As a second benchmark, the control of a tip–tilt mechanism driven by
piezo translators has been chosen. The mechanism used is commercially
available under the part number S330 from Physik Instrumente [PI98],
Waldbronn, and was originally supplied with analog PID controllers. In
order to compensate existing system resonances, the design described
aims to replace these analog controllers by digital ones with the smallest
possible gate number. As will be shown in this section, this is an im-
plementation case where the controller is far more complex than in the
PID demonstrator of Sect. 7.1 but the computation time constraints are
not strong enough to force multiple operator copies for the digit-parallel
solution. That means that the on-line arithmetic implementation (see
Fig. 1.2d, page 4) has to compete with a standard digit-parallel solution
(see Fig. 1.2a, page 4).
In the first part of this section the piezo system is introduced and the
two-degrees-of-freedom controller selected is motivated with a system
model. Subsequently, the on-line arithmetic computation scheme chosen
is given. It will later be compared to optimized digit-parallel solutions.
In order to have a clear reference, a common hardware platform around
a re-programmable Xilinx FPGA was designed. A detailed description
of the hardware used as well as implementation details of the on-line
arithmetic and digit-parallel arithmetic controllers are given. The last
section is concerned with the comparison of the on-line and digit-parallel
implementations.
7.2.1 System and Controller Representation
The tip–tilt mirror used, is designed for precise angular movements in
two axes. As shown in Fig. 7.4, the mechanical system consists of two
parts connected by a flexible structure. The upper part tilts with the
aid of two pairs of low voltage piezo translators (one pair shown in
the figure) which are equipped with strain gauge sensors to provide
position measurements. The length of the cylindric part is 37.5 mm and
its diameter 25 mm. The interferometer measurements indicated are
92 Chapter 7. Applications
only used for identification purposes and are not available afterwards.
The system allows two independent orthogonal tip–tilt movements of
about ±1 mrad.
GN  D
+ 100 V
(constant)
control
inp ut
0 ... +10 V
± 1 m rad25 µ m
+ − + −
1 0
Interfer ometer
piez o 1 piez o 2
stra in
gau	 ge
stra in
gau	 ge
Tip−Tilt
  Mirror
Figure 7.4: Principle of piezo tip–tilt mechanism [PI98]
The piezoelectric translators are arranged in a bridge circuit. One
power supply channel serves as bridge supply and the other two chan-
nels supply the center taps with voltages in the range of 0–100 V. In
order to have a symmetrical range, a permanent offset of half the op-
eration range (50 V) is usually supplied. With an amplification factor
of 10 this corresponds to a control input of ±5 V. A pre-loaded casing
provides high stiffness in the directions perpendicular to the inclination
movements.
Fig. 7.5 shows the displacement curves for slow sinusoidal inputs
of increasing amplitude. They show a significant hysteresis loop which
reaches up to 15% for large amplitudes. Taking into account this hys-
teresis and the flexible mechanical subsystem, the tip–tilt mirror can
be modeled for control purposes by a nonlinear block representing the
hysteresis behavior which is in series with linear blocks representing
the complex mechanical subsystem (see Fig. 7.6). In this model, the
transfer functions G1 and G2 are not independent (common poles and
7.2. Piezo Tip–Tilt Mirror 93
-4 -3 -2 -1 0 1 2 3 40.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
Hysteresis of Piezo Actuator (Axis1)
st
ra
in
 
ga
u
ge
 
se
n
so
r 
vo
lta
ge
input voltage

Figure 7.5: Hysteresis between voltage and displacement
zeros) but coupled by the feedback of the flexible connection of the two
mechanical subsystems.
For the modeling and identification of the mechanical subsystem,
the position of the upper part was measured by the additional interfer-
ometer indicated in Fig. 7.4. The measured Bode diagrams of Fig. 7.7
show significant resonance peaks in the range of 3 to 10 kHz on the in-
terferometer output. They have an important influence on the dynamic
behavior when left uncompensated. G1 and G2 can be modeled by high
order linear models (3rd order for G1, 10th order for G2). The model
fits for G1 ·G2 and G2 are also indicated in Fig. 7.7.
Both hysteresis and high resonance peaks are characteristic for piezo
actuators. The hysteresis is mostly dependent on the piezo electric con-
stant (which is optimized for large displacements) whereas the location
of the resonance peaks are mainly influenced by the stiffness of the
casing and the driven mass.
In order to improve the precision of the actuator and to maintain
a high bandwidth, a feedback controller with integral action is consid-
ered. The motivation for a robust design comes from the gain variations
94 Chapter 7. Applications
Piezo
Deformation
Mechanical
Subsystem
Feedback of 
Mechanical Subsystem
Strain Gauge
Sen  sor
Position
Mirror
Position
Input
Voltage
Hyste resis
Simplif ication
Input
Voltage
Hyste resis G 1 G 2
Strain Gauge
Sen  sor
Position
Mirror
Position
Figure 7.6: Model structure for piezo tip–tilt mechanism
caused by the uncompensated hysteresis in the system. They have to
be taken into account for the controller design. The resulting controller
structure is illustrated in Fig. 7.8.
The third order two-degrees-of-freedom controller used is designed
for making the inner loop low pass in order to damp sufficiently the
resonance peaks of the flexible system (G2). Additional design crite-
ria are a gain margin of 2.5 and good robustness to noise on the con-
trol input and sensor output. The controller employed R(q−1)u(t) =
T (q−1)r(t)− S(q−1)y(t), including the additional scaling factors of the
analog I/Os (see Sect. 7.2.3), is:
R(q−1) := 1− 1.1636q−1 + 0.16356q−2 (7.4)
S(q−1) := 0.9665q−1 − 0.6229q−2 + 0.5895q−3 − 0.3513q−4
T (q−1) := 0.7758q−1 − 0.9558q−2 + 0.7619q−3
The controller output is delayed by one sampling period in order to
guarantee a correct timing (output at the end of the sampling period)
and because the on-line arithmetic implementation has one sample delay
by design (see Sect. 3.1.1).
7.2. Piezo Tip–Tilt Mirror 95
102 103 104
-50
-40
-30
-20
-10
0
10
20
Bode Magnitude Plot (Interferometer Measurements)
Frequency in Hz
M
a
gn
itu
de
 
in
 
dB
102 103 104
-500
-400
-300
-200
-100
0
100
Bode Phase Plot (Interferometer Measurements)
Frequency in Hz
Ph
a
se
 
in
 
de
gr
e
e
G2
G1G2
G2
G1G2
Figure 7.7: Open loop Bode diagrams of G1·G2 and G2
Hyste  resis G 1 G 2
Antialiasing
Filter
A/D 
Converter
Robust
Controller
Re  f
D/A
Converter
Figure 7.8: Control system configuration for tip–tilt mirror
96 Chapter 7. Applications
The control algorithm was implemented for test purposes in a fast
TMS320C40 digital signal processor (running at 60 MHz, 60 MFlops)
with a sampling frequency of 20 kHz. The A/D and D/A converters
have a 16-bit resolution. The system worked correctly and the closed
loop bandwidth obtained is about 700 Hz compared to 300 Hz for simple
PID control.
7.2.2 On-Line Arithmetic Computation Scheme
This algorithm is realized by the complex operator shown in Fig. 7.9.
Registers with fixed size (see Sect. 3.1.3) are chosen for the state up-
dates. For the interface used their sizes are determined by:
δreg1 = δperiod = δres + δD/A + δA/D + δAlg = 16 + 1 + 2 + 5 = 24
respectively
δreg2 = δperiod −
∑
δforward = 24− 5 = 19
The two TCtoBS blocks on the controller entry convert the incoming
2’s complement numbers of the A/D converters into borrow-save repre-
sentation. The BStoDA block adapts the borrow-save output to the 2’s
complement representation of the D/A converters. The initialization,
normalization and out-switch blocks complete the multi-adder structure
to a modular on-line operator as explained in Sect. 3.1. The necessary
clock speed for a sampling period of 50µs is 20 kHz · 24 = 480 kHz.
 
uk+1
t1 
+
FIFO  δ=19
r k
r

1  δ =  5
yk
r

2
s1 
t2
t3
s4	
s2
s3
FIFO δ=24
FIFO δ=24
FIFO δ=24
FIFO δ=24
FIFO δ=24
FIFO δ=24
Normal-
ization
Out-
Switch
Initiali


zation
 
uk
rk−1
yk−1
yk−2
 ctr_in  ctr_o ut
TCt oBS
TCt oBS
re f
po s BSto
 DA
ou t
Figure 7.9: Implementation scheme of on-line controller
7.2. Piezo Tip–Tilt Mirror 97
7.2.3 Hardware Implementation
Contrary to the PID demonstrator, the hardware environment is based
on a re-programmable FPGA. Therefore, it can be used as a general
purpose platform to test different on-line arithmetic algorithms. For
this reason, the hardware is described in detail in this subsection.
Environment for On-Line Implementation
The motherboard is an Europe format board, called Labomat [Go¨k95],
developed by the LSL (Logic System Laboratory), EPFL, for teaching
purposes. It holds a Motorola 68331 (32-bit) micro-controller, a re-
programmable Xilinx5210 FPGA with an equivalent gate count of about
16 k, a 256 kByte flash memory and a 256 kByte SRAM memory as well
as an EPROM hosting a monitor program for the serial link. The board
can be programmed via RS232 from any computer and about 50 pins
of the FPGA are accessible via connectors.
For the analog inputs and outputs to the piezo system an analog
daughter board was developed. It holds 4 anti-aliasing filters with pre-
amplifiers (fB = 5 kHz), 4 16-bit A/D-converters with serial output
during conversion (2 for position, 2 for reference inputs), 4 16-bit D/A-
converters with serial inputs as well as 2 instrumentation amplifiers
which produce the analog difference of two D/As (positive minus neg-
ative part of redundant output, see Sect. 2.4). The input and output
voltage ranges were adapted to the levels of the piezo system. A simpli-
fied operational scheme of the analog I/O-board is shown in Fig. 7.10.
For the piezo demonstrator very little of the motherboard function-
ality is employed. As summarized in Fig. 7.11, the serial link is only
used for programming purposes, the micro-controller with its included
PLL circuit takes the role of a clock generator and the FPGA hosts the
actual controller as well as the necessary glue logic for the interface to
the I/O-board.
Software has to be written for the micro-controller (clock generation)
as well as for the FPGA (controller and interface). Source code for the
micro-controller can be written in C and can be compiled with the Gnu-
C-compiler (gcc for 68000). Before the object code can be downloaded it
has to be converted to hexadecimal numbers. Source code for the FPGA
is written in VHDL using the on-line arithmetic library described (see
Sect. 4.5). This code is subsequently synthesized by Synopsis or another
synthesis tool before it is mapped to the Xilinx architecture by Design
Manager, a software tool supplied by Xilinx. Before the FPGA object
98 Chapter 7. Applications
Low Pass
Filt  er 
(5 kHz)
BNC
0 to 2.5 V
Re f 1 Ampli
with ra nge
adaptation
A/D-
converter
16 bit0 to 5 V
BNC
0 to 2.5 V
Pos 1 Ampli
with ra nge
adaptation 0 to 5 V
BNC
0 to 2.5 V
Re f 2 Ampli
with ra nge
adaptation 0 to 5 V
BNC
0 to 2.5 V
Pos 2 Ampli
with ra nge
adaptation 0 to 5 V
Low Pass
Filt  er 
(5 kHz)
Low Pass
Filt  er 
(5 kHz)
Low Pass
Filt  er 
(5 kHz)
A/D-
converter
16 bit
A/D-
converter
16 bit
A/D-
converter
16 bit
D/A-
converter
16 bit
D/A-
converter
16 bit
D/A-
converter
16 bit
D/A-
converter
16 bit
BNC
-6 to 6 V
Piez o 1
BNC
-6 to 6 V
Piez o 2
-
-3 to  3 V
Difference
Amplifier
-3 to  3 V
+
-
-3 to  3 V
Difference
Amplifier
-3 to  3 V
D
IN
 
10
 
pi
n
s
D
IN
 
1 0
 
pi
n
s
D
IN
 
1 0
 
pi
n
s
D
IN
 
1 0
 
pi
n
s
+
Figure 7.10: Functional schematic of the input/output-board
7.2. Piezo Tip–Tilt Mirror 99
RS232
- Interface for p  rogram 
  and configuration download
- debugging
Microcontroller
- Generation  of clock
- debugging of FPGA
    FPGA
- On-Line Co ntroller
- Interface to I/O-board
I/O-Board
- analog in terface 
  for piezo mirror
Memory
- buffer for debugging
Figure 7.11: Motherboard configuration for the piezo demonstrator
100 Chapter 7. Applications
code can be downloaded it also has to be converted to hexadecimal
numbers. The download of micro-controller and FPGA hex-code is
done in an interactive dialog in a terminal emulator using a monitor
program which runs in the EPROM of the motherboard.
In order to keep the development of a new controller simple, repeated
tasks were simplified by scripts and a common interface for controllers
using the I/O-board was written. Changing the clock frequency was
also reduced to changing a constant number in a C-program.
Input–Output Interface
In order to follow the implementation guidelines given in Sect. 3.1, an
interface is provided which feeds the input values in serial form into the
arithmetic module synchronized to a control signal which indicates valid
digits. The outputs of the arithmetic module are also received in serial
Controller Interface
coun  ter
contr oller
clock ge nerator
re f
po s
clo ck
ctr_in
ou t
res et
ctr_out
A/ D
A/ D
A/ D
A/ D
D/ A
D/ A
D/ A
D/ A
Xilinx  5210
Xilinx  5210
Figure 7.12: Universal interface for on-line arithmetic controller
form and the D/A conversion is controlled by the output of the control
line. In order to keep the interface as modular as possible, the input
part (start and read of A/D converters) was controlled by a counter
and the output part (write and start of D/A converter) as well as the
counter reset were controlled by the output signals of the arithmetic
module. Along these lines, a new controller can be implemented by
simply replacing the controller block in Fig. 7.12 and by adjusting the
sampling period by choosing an appropriate clock frequency for the
micro-controller. The detailed timing of the universal interface is shown
in Fig. 7.13 for an arithmetic module with a delay of δ = 5.
7.2. Piezo Tip–Tilt Mirror 101
clo  ck
2 1 2 2 0 0 1
A/D sa mple
ctr_in
2 7 8 16 17 2 2 0 0 1 2 3 4
2
ctr_out 7

counte	 r reset
17
22
coun
 ter
D/A l atch 7( n)
2
Universal Interface (per iod = 19 + δcontroller)
18
0( n)
0( n)
0( n)
18( n)
19
0( n)
0( n)
0( n)
2 2
A/D and D/A
conversion
A/D and D/A
conversion
Figure 7.13: Detailed timing of universal interface((n) = negative
slope)
For the implementation of the on-line arithmetic controller for both
axes, this interface has to be extended by an indicator signal. This
indicator toggles between the two values 0 and 1 and is switched with
the falling edge of the control line output. The toggle signal drives
a connection matrix which switches the state registers from a storage
mode to a controller mode (for more details see Sect. 4.3).
In the case of the piezo demonstrator the controller block of Fig. 7.12
was replaced by the two-degrees-of-freedom controller described above.
The different input and output ranges of the analog extension board (see
Fig. 7.10) introduce an additional scaling factor which was incorporated
into the controller constants of Eqn. 7.4 (input yˆ = 1 corresponds to
y = 2 · 2.5 V, output uˆ = 1 corresponds to u = 6 V).
Implementations were undertaken for a one axis and a two axes
solution. For the two axes controller, instead of using two separate
controllers, reuse of the one axis controller was realized following the
description in Sect. 4.3.
With the quantification of the clock divider in the micro-controller
(quantification steps are 4fReference = 4 ·32.768 kHz = 131 kHz) a value
of 524 kHz was chosen for the one axis solution (1.048 MHz for the two
102 Chapter 7. Applications
FPGA on sp fp on-two sp-two fp-two
Lookup Tables 528 986 3221 571 1161 3233
Flip Flops 632 337 224 957 510 339∑
LUT + FF 1160 1323 3445 1528 1671 3572
Clock Cycles 24 53 39 48 106 78
Table 7.3: Comparison of on-line and digit-parallel implementations
axes). This leads to a value of 3 (6 for the two axes controller) for the
upper byte of the SYNCR clock register specified in the C-program for
the micro-controller.
7.2.4 System Performance
The resulting controllers based on on-line arithmetic (for one axis and
two axes) were compared to digit-parallel implementations. For this
purpose the two-degrees-of-freedom controller was first realized in a se-
quential scheme using one digit-parallel multiply-add operator which is
subsequently used eight times in order to compute the new controller
output. Intermediate results and state values are stored in registers.
The connection matrix (between operands, registers and operators) is
controlled by a fixed sequence specific to the two-degrees-of-freedom
controller. Therefore, no additional constant registers had to be added
with respect to the on-line implementation.
The second structure uses a full-parallel implementation similar to
the on-line realization but based on digit-parallel operators, i.e. for each
constant a multiplier was implemented and finally connected to a binary
tree of adders.
Tab. 7.3 shows a comparison of the different solutions. The on-
line arithmetic controllers are indicated by on and on-two, the sequen-
tial digit-parallel by sq and sq-two, and the full-parallel by fp and fp-
two (two stands for two axes). The number of necessary clock cycles
is slightly misleading because the digit-parallel implementations were
based on the same I/O-board as the on-line solutions (with serial A/D
and D/A converter interface). For an optimized interface with respect
to the digit-parallel implementation (digit-parallel interface) the num-
ber of clock cycles would be similar in the three cases.
The two-degrees-of-freedom controller consists only of linear oper-
ations and the computation time constraints are not hard enough to
7.2. Piezo Tip–Tilt Mirror 103
require multiple operators in the digit-parallel scheme (sp and sp-two).
Therefore, this application represents a disadvantageous situation for
an on-line implementation. However, the size of the resulting on-line
circuit is still slightly smaller than in the digit-parallel cases. Based on
the modular on-line operators the controller construction is much sim-
pler than in the sequential case and there is still a much smaller number
of necessary interconnections (because of serial transmission).

Chapter 8
Conclusions
This final chapter summarizes the contributions of this work and re-
lates the material presented to practical requirements. The methods
proposed are critically discussed and directions for further research are
proposed.
8.1 Achievements
In this thesis several aspects concerning the implementation of real-time
digital controllers in on-line arithmetic have been discussed. It has been
shown that for a large class of controllers on-line arithmetic can satisfy
contradictory specifications like high speed, small size and low power
consumption. In this context the proposed overlap of A/D conversion
and on-line arithmetic computation is believed to be novel at this time.
It makes a time period available for computations which was unused by
other approaches.
The main effort of this thesis was spent in simplifying the imple-
mentation of controllers in on-line arithmetic. In the past, for every
specific on-line arithmetic design an arithmetic specialist was required
in order to figure out the appropriate structure. No general design rules
were available from the literature. In this thesis, two design concepts
were introduced which make this kind of implementation accessible to
a large number of engineers. The first design method, called Modular
On-Line Arithmetic Operators, hasn’t been mentioned in the literature
before. It extends the mathematical on-line arithmetic operators in such
a way that they can be used in a modular fashion as in the full-parallel
105
106 Chapter 8. Conclusions
implementation of digit-parallel operators. It assumes a common scale
for intermediate results as most fixed point methods do. The other
method, called Global Execution Control, has already been implicitly
used by several authors (e.g. [BEW89, Cha91]), but it needed to be
clearly described in order to prevent designers from re-doing the same
kind of conceptual work from scratch for every new design. It is partic-
ularly well suited to problems with highly varying scale of intermediate
results.
In the context of necessary extensions of on-line arithmetic, it was
recognized that the existing normalization algorithms were not appro-
priate for all possible cases. Especially situations where several addi-
tional leading digits were generated in a feed-forward path (e.g. with
multi-adders), the existing algorithms were either time consuming or
they simply failed. Therefore, the existing normalization algorithm of
Merrheim [Mer94] was extended for the general form of the problem.
Real-time controllers can be computed in an abundance of different
representations. This is due to the separation of the computation in
direct computations and state-updates. However, only a few of these
representations are appropriate for on-line arithmetic implementations
because they show large differences in calculation time and numerical
conditioning. In this context it was shown that controllers in Jordan
form with the use of the delta operator are numerically well conditioned
for on-line arithmetic computations.
The design concepts given in Chap. 3 assume an existing operator
library. Such a library (written in VHDL) was developed together with
A. Tisserand (ENS, Lyon). In order to enable readers who don’t have
access to this library to reconstruct their own, the basic ideas of the
library structure have been explained (Sect. 4.5). This is probably the
first on-line library of its type and it could be extended to include a
wider choice of operators.
In this thesis, the question of the appropriate radix for on-line com-
putations is discussed in the context of real-time digital controllers.
It was shown that with higher radixes the computation time can be
remarkably reduced. However, an increase in circuit size has to be ac-
cepted. It was also pointed out that the choice of the digit set and the
bit-level coding have an important influence on the circuit size. For most
real-time controllers a radix 2 implementation is sufficiently fast and is
thus preferred. However, for some signal processing problems with high
speed and power consumption requirements a radix 4 implementation
can be advantageous.
8.2. Practical Application Perspective 107
In the comparison to digit-parallel solutions in Chap. 6, it was clearly
pointed out that speed as the only criteria is not sufficient to motivate
an on-line arithmetic approach. It is mainly the combination with small
size and low power consumption which makes on-line arithmetic attrac-
tive.
When looking through the on-line arithmetic literature, it is surpris-
ing how small is the number of physically implemented on-line arith-
metic computations. Therefore, it is an important contribution to ex-
ploit the design guidelines formulated on two real implementations and
several others which are not described here.
8.2 Practical Application Perspective
In today’s instrumentation industries, there are trends towards higher
integration of additional functionality into micro-electronics and to rapid
prototype development. Design software based on the implementation
guidelines given could play therein an important role for simple hard-
ware development. As already mentioned in Chap. 6, this will mainly
concern small or mid-size controllers. However, in most micro-systems
the controllers are quite simple because the integration of the sensors
allows direct measurements and because the relevant system models are
of low complexity given their small operating range.
The fields of possible applications are large. All kinds of integrated
applications are good candidates. These host mechanical and micro-
electrical subsystems on the same substrate and often several simple
controllers for each subsystem are employed. To this class belong for
example systems like the actively controlled arrays of micro-mirrors
[KKWB96, Mig94] or magnetic micro-bearings (e.g. an integrated gy-
roscope [AVA+98]). Another possible implementation area is the class
of systems where an ASIC is already justified by other logical functions
necessary for the particular application and where the on-line arith-
metic controller can serve as a kind of real-time specific coprocessor.
This is the case in most systems where ASICs are used. Stand-alone
dynamic controllers are exceptions. All applications based on ASICs
have the drawback that circuit development and manufacturing is still
very expensive and time consuming. Further progress in manufacturing
technologies will probably make this kind of micro-electronics accessible
to a wider class of applications.
Inspired by the work presented in this thesis other application per-
108 Chapter 8. Conclusions
spectives were opened up by a team of researchers from CSEM, Neuchaˆtel.
They have developed the concept for a reconfigurable chip (including
analog and digital I/O), called FPOP (field programmable on-line op-
erator), based on on-line arithmetic operators. It should cover a large
range of applications in the fields of digital control and signal pro-
cessing. The emphasis has been put on very low power consumption,
but with the goal to be as fast as the fastest DSPs on the market
(e.g. TMS320C6201 with 1600 MIPS). All this functionality should be
integrated in only one package of small dimensions in order to simplify
the integration with the target system. For this reconfigurable circuit
a patent is pending and further details can be found in [TMP99]. This
fast on-line arithmetic could be used for the computation of the switch-
ing instants in electrical drives. Most of the existing implementations
in this field still use lookup tables because of the high switching fre-
quencies. However, implementation for slower devices (like drives of
locomotives) illustrate very well the possible improvements of the dy-
namic behavior. Another example is the intensity control of lasers used
for welding. The dynamic of these systems is similar to a current loop
and thus very fast. Only very specific hardware solutions can be used
for a digital implementation.
8.3 Further Research
During the past 3 years, since work started on this thesis, most of the
important aspects concerning the on-line arithmetic implementation of
real-time digital controllers have been investigated. However, like in
any other domain several specific questions have appeared which still
require more intensive research.
As was shown above, the on-line arithmetic operator library plays an
important role for simple hardware implementations. Until now, only
the basic operators were implemented and a lot of manpower is needed
to extend this library to a larger number of operators. An important
issue in this context is also the question of how to treat operators which
have a restricted input range. They need some pre-scaling which will
influence their operator delay. This effect can subsequently lead to
synchronization problems.
In the long term a high level programming language (e.g. MATLAB
or C) must replace the VHDL description in order to gain user accep-
tance and to reduce time-to-market of new applications. This will force
8.3. Further Research 109
the development of the compilation software necessary. First concepts
for such a programming environment were already introduced in the
FPOP project.
In terms of low power applications there is a large potential in on-line
arithmetic implementations. In this thesis only the basic motivation for
low power designs have been given. In order to gain more quantitative
results, measurements on real circuits or at least very detailed simula-
tions become necessary. In the FPOP project these will influence the
choice of the radix and the specific operator structure.
In this thesis only fixed-point implementations have been discussed.
This is mainly because in on-line floating-point implementations first
the operator size in much bigger (mantissa and exponent computation)
and second because digit-pipelining becomes much more difficult. De-
pending on the scale of the intermediate results, sometimes a rescaling
of the mantissa becomes necessary which is directly related to a shift
of the number and thus to a synchronization problem. The more com-
plex execution control and the larger operator size risk eliminating the
advantages of the online approach. Further work investigating these
aspects would be of interest.

List of Abbreviations
A/D Analog to Digital
Addi Address i
add add
ALU Arithmetic and Logic Unit
Alg. Algorithm
ASIC Application Specific Integrated Circuit
BStoDA Borrow-Save to D/A converter
Chap. Chapter
CARESSE Calculateur Redondant Scientifique en Se´rie is French for
redundant serial scientific computer
CLB Configurable Logic Block
CSEM Centre Suisse d’E´lectronique et de Microte´chnique
D/A Digital to Analog
DSP Digital Signal Processor
ENS E´cole Nationale Supe´rieure
EPFL E´cole Polytechnique Fe´de´rale de Lausanne
Eqn. Equation
FIFO First In First Out
Fig. Figure
fp full-parallel digit-parallel arithmetic
FPGA Field Programmable Gate Array
FPOP Field Programmable On-Line Operator
111
112 List of Abbreviations
Hz Hertz
init initialization
init(ext) extended initialization
instr instruction
I/O Input Output
kHz kilo Hertz
kByte kilo Byte
LSD Least Significant Digit
LSDF Least Significant Digit First
LSL Laboratoire des Syste`mes Logiques
MIMO Multiple Input Multiple Output
MHz Mega Hertz
MIPS Millions of Instructions per Second
mmp minus minus plus
MPC Model Predictive Control
MSD Most Significant Digit
MSDF Most Significant Digit First
mult multiply
mv move
Op Operator
ol on-line arithmetic
PID Proportional Integral Differential
ppm plus plus minus
reg register
res resolution
Tab. Table
TCtoBS Two’s Complement to Borrow Save
s seconds
Sect. Section
SISO Single Input Single Output
List of Abbreviations 113
sp sequential digit-parallel arithmetic
SRAM Static Random Access Memory
sub subtract
Subsect. Subsection
SVD Singular Value Decomposition
V Volt
VHDL Very high speed integrated circuit Hardware Description
Language

List of Symbols
Italic Symbols
a+i Positive bit of the ith digit
a+ Positive bits of a
A Dynamic matrix of state space description
B Input matrix of state space description
C Output matrix of state space description
ctr ahead Additional control line to stop initialization of subsequent
operators
ctr in Control line input
ctr out Control line output
D Transition matrix of the state space description
f(xk, sk, rk) State update function of xk, sk, and rk
fs Sampling frequency
g(xk, sk, rk) Output function of xk, sk, and rk
h Sampling time
n Number of operand digits
N Number of operands
O() In the order of magnitude of
q−1 Backward shift operator
r Radix
tcomputation Computation time for one axis in reuse scheme
T4 Digit period for radix 4
115
116 List of Symbols
Tcalc4 Calculation time for radix 4 implementation
Greek Symbols
δ−1 Inverse delta operator
δ On-line delay
δ2 On-line delay for radix 2
δ4 On-line delay for radix 4
δA/D Delay of the sample and hold unit of the A/D converter
in number of clock cycles
δadder On-line delay of an adder
δarithmetic On-line delay of the controller input-to-output path
δAvizienis Delay of Avizienis adder
δD/A Delay of D/A converter in number of clock cycles
δForest Delay of Forest adder
δforward On-line delay in the forward part of a loop
δHybrid Delay of Hybrid adder
δimprove Speed improvement in number of clock cycles
δmax Maximum on-line delay of an operator in the algorithm
δn On-line delay (n clock cycles) due to the serial character
of operands
δopi On-line delay of operator i
δopt Optimal on-line delay of a certain operation
δpath On-line delay of the operators from the actual position to
the controller output
δperiod Length of one sampling period in number of clock cycles
δPID On-line delay of the two-degrees-of-freedom controller
δreg Size of shift registers in number of clock cycles
δshift Scaling factor of the operator output scaling = 2δshift
δzeros Number of intermediate zeros
δzeros,min Minimum number of intermediate zeros
τ Digit period
List of Symbols 117
Others
D Digit set
Q Rational numbers
R Real numbers

Bibliography
[Act96] Actel. ACT Family FPGA Databook. Actel Corporation,
Sunnyvale, 1996.
[Agu94] M. Aguilar. Conception and Simulation d’une Machine
Massivement Paralle`le en Grande Pre´cision. PhD thesis,
E´cole Normale Supe´rieure, Lyon, 1994.
[AVA+98] C. Aymon, R. Vuillemin, B. Aeschlimann, M. Ku¨mmerle,
A. Wu¨rsch, A. Bletis, H. Bleuler, E. Fullin, and J. Berquist.
A contact free suspended micro motor. In Proc. of the 6th
International Conference on New Actuators, Actuator 98,
Bremen, 1998.
[Avi61] A. Avizienis. Signed-digit number representations for fast
parallel arithmetic. IRE Transactions on Electronic Com-
puters, 10:389–400, 1961. Reprinted in E.E. Swartzlan-
der, Computer Arithmetic, Vol. 2, IEEE Computer Society
Press Tutorial, Los Alamitos, 1990.
[Baj93] J.C. Bajard. Evaluation de fonctions dans des syste`mes
redondants d’e´criture des nombres. PhD thesis, E´cole Nor-
male Supe´rieure, Lyon, 1993.
[BDKM94] J.C. Bajard, J. Duprat, S. Kla, and J.M. Muller. Some op-
erators for on-line radix-2 computations. Journal of Paral-
lel and Distributed Computing, 22(2):336–345, 1994.
[BEW89] R.H. Brackert, M.D. Ercegovac, and A.N. Wilson. Design
of a multiply-add module for recursive digital filters. In
Proc. of the 9th IEEE Symposium on Computer Arithmetic,
pages 34–41, Santa Monica, 1989. IEEE Service Center,
Piscataway.
119
120 Bibliography
[Bra89] R. Brackert. Design and Implementation of a High-Speed
Recursive Digital Filter Using On-Line Arithmetic. PhD
thesis, UCLA, Los Angeles, 1989.
[BRV89] P. Bertin, D. Roncin, and J. Vuillemin. Introduction to
programmable active memories. Technical report, DEC Re-
search Laboratory, Paris, 1989.
[BWE89] R.H. Brackert, A.N. Wilson, and M.D. Ercegovac. A high-
speed recursive digital filter using on-line arithmetic. In
Proc. of the ISCAS’89, pages 1552–1555, Portland, 1989.
IEEE Service Center, Picataway.
[CDHM91] G. Corbaz, J. Duprat, B. Hochet, and J.M. Muller. Im-
plementation of a VLSI polynomial evaluator for real-time
applications. In Proc. of ASAP’91, 1991.
[Cha91] L.C. Cha. A recursive digital filter using on-line arithmetic.
Master’s thesis, UCLA, Los Angeles, 1991.
[DM88] J. Duprat and J.M. Muller. Hardwired polynomial eval-
uation. Journal of Parallel and Distributed Computing,
5(3):291–309, 1988. Special issue on parallelism in com-
puter arithmetic.
[DMT97] M. Daumas, J.M. Muller, and A. Tisserand. Very high
radix on-line arithmetic for accurate computations. In
Proc. of the 15th IMACS World Congress on Computation
and Applied Mathematics, 1997.
[DS88] P.B. Denyer and S.G. Smith. Advanced serial-data com-
putation. Journal of Parallel and Distributed Computing,
5(3):228–249, 1988. Special issue on parallelism in com-
puter arithmetic.
[EG80] M.D. Ercegovac and A.L. Grnarov. On the performance of
on-line arithmetic. In Proc. of the 1980 Parallel Processing
Conference, pages 55–62. IEEE, New York, 1980.
[EG83] M.D. Ercegovac and A.L. Grnarov. On-line multiplicative
normalization. In Proc. of the 6th Symposium on Computer
Arithmetic, pages 151–155. IEEE, New York, 1983.
Bibliography 121
[EL85] M.D. Ercegovac and T. Lang. A division algorithm with
prediction of quotient digits. In Proc. of the 7th IEEE
Symposium on Computer Arithmetic, pages 51–56, Urbana,
1985. IEEE Computer Society Press, Silver Spring.
[EL87a] M.D. Ercegovac and T. Lang. On-line scheme for comput-
ing rotation angles for SVDs. In Proc. of the SPIE Con-
ference on Real-Time Signal Processing, volume 826, pages
160–169, San Diego, 1987.
[EL87b] M.D. Ercegovac and T. Lang. On-the-fly conversion of re-
dundant into conventional representations. IEEE Transac-
tions on Computers, C-36(7):895–897, 1987.
[EL88a] M.D. Ercegovac and T. Lang. On-line arithmetic: A design
methodology and applications in digital signal processing.
In R.W. Brodersen and H.S. Moscowitz, editors, VLSI Sig-
nal Processing, volume III, pages 252–263. IEEE Press, New
York, 1988.
[EL88b] M.D. Ercegovac and T. Lang. On-line scheme for comput-
ing rotation factors. Journal of Parallel and Distributed
Computing, 5:209–227, 1988.
[EL94] M.D. Ercegovac and T. Lang. Division and square root:
digit-recurrence algorithms and implementations. Kluwer
Academic Publishers, Boston, 1994.
[EMT95] M.D. Ercegovac, J.M. Muller, and A. Tisserand. FPGA im-
plementation of polynomial evaluation algorithm. In Field
Programmable Gate Arrays for Fast Board Development
and Reconfigurable Computing, volume 2607, pages 177–
188. SPIE, Philadelphia, 1995.
[Erc77] M.D. Ercegovac. A general hardware-oriented method for
evaluation of functions and computations in a digital com-
puter. IEEE Transactions on Computers, C-26(7):667–680,
1977.
[Erc78] M.D. Ercegovac. An on-line square rooting algorithm. In
Proc. of the 4th Symposium on Computer Arithmetic. IEEE
Computer Society Press, 1978.
122 Bibliography
[Erc84] M.D. Ercegovac. On-line arithmetic: an overview. In
Real Time Signal Processing VII, volume 495, pages 86–
93. SPIE, 1984.
[Erc91] M.D. Ercegovac. On-line arithmetic for recurrence prob-
lems. In Advanced Signal Processing Algorithms, Architec-
tures, and Implementations II, volume 1566, pages 263–274.
SPIE, 1991.
[ET77] M.D. Ercegovac and K.S. Trivedi. On-line algorithms for
division and multiplication. IEEE Transactions on Com-
puters, C-26(7):681–687, 1977.
[ET87] M.D. Ercegovac and P.K.G. Tu. A radix-4 on-line division
algorithm. In Proc. of the 8th Symposium on Computer
Arithmetic, pages 181–187. IEEE Computer Society Press,
Washington, 1987.
[ET89] M.D. Ercegovac and P.K.G. Tu. Design of on-line division
unit. In Proc. of the 9th Symposium on Computer Arith-
metic, pages 42–49. IEEE Computer Society Press, Wash-
ington, 1989.
[FE92] J.S. Fernando and M.D. Ercegovac. On-line arithmetic
modules for recursive digital filters. In Proc. of the 26th An-
nual Asilomar Conference on Signals, Systems and Com-
puters, volume 2, pages 681–685. IEEE Society Press, Los
Alamitos, 1992.
[For97] R. Forest. Arithme´tique on-line en base 4 pour les con-
trolleurs digitaux en automatique. Master’s thesis, E´cole
Polytechnique Fe´de´rale, Lausanne, 1997.
[FPW98] G. Franklin, D. Powell, and M. Workman. Digital Control
of Dynamic Systems. Addison-Wesley, Menlo Park, 1998.
[GHM89] A. Guyot, Y. Herreros, and J.M. Muller. Janus, an on-
line multiplier/divider for manipulating large numbers. In
Proc. of the 9th Symposium on Computer Arithmetic, pages
106–111. IEEE Computer Society Press, Washington, 1989.
[Go¨k95] M. Go¨ke. LABOMAT, Manuel d’utilisation. EPFL-LSL,
Lausanne, 1995.
Bibliography 123
[GT96] B. Girau and A. Tisserand. On-line arithmetic based repro-
grammable hardware implementation of multilayer percep-
tron back-propagation. In Proc. of the 5th International
Conference on Microelectronics for Neural Networks and
Fuzzy Systems, pages 168–175, Lausanne, 1996. IEEE Com-
puter Society Press, Los Alamitos.
[GT99] B. Girau and A. Tisserand. MLP computing and learning
on FPGA using on-line arithmetic. International Journal of
Systems Research and Information Science, 1999. in press.
[HC90] R. Hartley and P. Corbett. Digit-serial processing tech-
niques. IEEE Transactions on Circuits and Systems,
37(6):707–719, 1990.
[IO79] M.J. Irwin and R.M. Owens. On-line algorithms for the de-
sign of pipeline architecture. In Proc. of the 4th Symposium
on Computer Architecture. IEEE Computer Society Press,
Los Alamitos, 1979.
[IO87] M.J. Irwin and R.M. Owens. Digit-pipelined arithmetic
as illustrated by the paste-up system: a tutorial. IEEE
Transactions on Computers, 20(4):61–73, 1987.
[Irw78] M.J. Irwin. A pipelined processing unit for on-line division.
In Proc. of the 5th Symposium on Computer Architecture,
pages 24–30. IEEE Computer Society Press, Los Alamitos,
1978.
[Kas98] R. Kasper. Tutorial: Tools for real-time motion control,
motion control with field programmable gate arrays. In
Proc. of the MOVIC 98, pages 1–20, Zu¨rich, 1998.
[KKWB96] K. Kehr, S. Kurth, J. Wibbeler, and T. Bo¨hm. Regelung
elektrostatischer Mikroaktuatoren mit digitalem Signal-
prozessor. In Proc. of the Workshop on mechanisch aktive
Mikrosysteme, Imenau, 1996.
[Kla93] S. Kla. Calcul Paralle`le en En-Ligne des Fonctions
Arithme´tiques. PhD thesis, E´cole Normale Supe´rieure,
Lyon, 1993.
124 Bibliography
[LE92] M.E. Louie and M.D. Ercegovac. Mapping division algo-
rithms to field programmable gate arrays. In Proc. of the
1992 Asilomar Conference of Signals, Systems and Com-
puters, volume 1, pages 371–375. IEEE Computer Society
Press, Los Alamitos, 1992.
[LE93a] M.E. Louie and M.D. Ercegovac. A digit-recurrence square
root implementation for field programmable gate arrays. In
Proc. of IEEE Workshop on FPGAs for Custom Computing
Machines, pages 178–183. IEEE Computer Society Press,
Los Alamitos, 1993.
[LE93b] M.E. Louie and M.D. Ercegovac. On digit-recurrence di-
vision implementation for field programmable gate arrays.
In Proc. of the 11th IEEE Symposium on Computer Arith-
metic, pages 202–209, Windsor, 1993.
[LS87] H. Lin and H.J. Sips. A novel floating-point on-line divi-
sion algorithm. In Proc. of the 8th IEEE Symposium on
Computer Arithmetic, pages 188–197, Como, 1987. IEEE
Society Press, Washington.
[Mer94] X. Merrheim. Bases discretes et calcul des fonctions
elementaires par materiel. PhD thesis, E´cole Normale
Supe´rieure, Lyon, 1994.
[MG86] R. Middleton and G. Goodwin. Improved finite wordlength
characteristics in digital control using delta opera-
tors. IEEE Transactions on Automatic Control, AC-
31(11):1015–1020, 1986.
[Mig94] M.A. Mignardi. Digital micromirror array for projection
TV. Solid State Technology, 37(7):63–68, 1994.
[MMY93] X. Merrheim, J.M. Muller, and H.J. Yeh. Fast evaluation
of polynomials and inverses of polynomials. In Proc. of
the 11th IEEE Symposium on Computer Arithmetic, pages
186–192, Windsor, 1993. IEEE Computer Society Press,
Los Alamitos.
[Mon95] L.A. Montalvo. A high performance division algorithm,
suitable for VLSI implementation, using hybrid addition
and subtraction. PhD thesis, L’Institut National Polytech-
nique, Grenoble, 1995.
Bibliography 125
[MP90] P.C. Mathias and L.M. Patnaik. Systolic evaluation of
polynomial expressions. IEEE Transactions on Computers,
39(5):653–665, 1990.
[MRM93] J. Moran, I. Rios, and J. Meneses. Signed digit arithmetic
on FPGAs. In Proc. of the International Workshop on Field
Programmable Logic and Applications, Oxford, 1993.
[Mul89] J.M. Muller. Arithmetique des Ordinateurs. Masson, Paris,
1989.
[Mul94] J.M. Muller. Some characterizations of functions com-
putable in on-line arithmetic. IEEE Transactions on Com-
puters, 43(6):752–755, 1994.
[NM96] A. Munk Nielsen and J.M. Muller. On-line arithmetic for
computing exponentials and logarithms. In Proc. of the Eu-
ropar’96, volume 2, pages 165–174. Springer Verlag, Berlin,
1996.
[OE82] V.G. Oklobdzija and M.D. Ercegovac. An on-line square
root algorithm. IEEE Transcations on Computers, C-
31(1):70–75, 1982.
[OI79] R.M. Owens and M.J. Irwin. On-line algorithms for the
design of pipeline architectures. In Proc. of the 6th Annual
Symposium on Computer Architecture, pages 12–19. IEEE,
New York, 1979.
[PI98] PI. NanoPositionierung. Physik Instrumente, Waldbronn,
1998.
[Tis94] A. Tisserand. Arithme´tique en-ligne sur l’acce´le´rateur
mate´riel DEC-PeRLe1. Master’s thesis, E´cole Normale
Supe´rieure, Lyon, 1994.
[Tis97] A. Tisserand. Ade´quation Arithme´tique Architecture. PhD
thesis, E´cole Normale Supe´rieure, Lyon, 1997.
[TMP99] A. Tisserand, P. Marchal, and C. Piguet. FPOP: Field pro-
grammable on-line operators. In Proc. of the SPIE Annual
Meeting, Advanced Signal Processing Algorithms, Architec-
tures and Implementations, Denver, 1999.
126 Bibliography
[Tri94] S. Trimberger. Field-Programmable Gate Array Technol-
ogy. Kluwer Academic Publishers, Boston, 1994.
[Tu90] P. Tu. On-Line Arithmetic Algorithms for Efficient Imple-
mentation. PhD thesis, UCLA, Los Angeles, 1990.
[Vac95] R. Vaccaro. Digital Control: A State-Space Approach.
McGraw-Hill, New York, 1995.

128 Curriculum Vitae
Curriculum Vitae
Name: Martin Dimmler
Date of birth: September 22, 1966
Place of birth: Karlsruhe, Germany
Education:
1995 to 1999 Swiss Federal Institute of Technology (EPFL):
Ph.D. project in the
Automatic Control Laboratory (IA).
Graduate courses in automatic control.
1994 to 1995 Research project in the
Microprocessor and Interface Laboratory (LAMI).
Graduate courses in computer sciences.
1993 to 1994 Swiss Federal Institute of Technology (ETHZ):
Research project in the
System Engineering Group (SEG).
Graduate courses in automatic control.
1986 to 1993 TH Karlsruhe, Germany:
Undergraduate and graduate studies in mechanical
engineering, specializing on automatic control.
1973 to 1986 Primary and secondary school, Karlsruhe, Germany.
Professional Experience:
1994 to 1999 Research assistant and tutor at EPFL and ETHZ.
1989 to 1993 Part-time software developer for Siemens, Germany.
Summer 1991 Siemens, USA. Internship in power plant technology.
1989 to 1993 TH Karlsruhe: tutor in Computer Aided Design.
