Optimal Realizations of Floating-Point Implemented Digital Controllers with Finite Word Length Considerations by Wu, J. et al.
Optimal realizations of ﬂoating-point implemented digital controllers with ﬁnite
word length considerations
JUN WUy, SHENG CHENz*, JAMES F. WHIDBORNE} and JIAN CHUy
The closed-loop stability issue of ﬁnite word length (FWL) realizations is investigated for digital controllers implemented
in ﬂoating-point arithmetic. Unlike the existing methods which only address the eﬀect of the mantissa bits in ﬂoating-
point implementation to the sensitivity of closed-loop stability, the sensitivity of closed-loop stability is analysed with
respect to both the mantissa and exponent bits of ﬂoating-point implementation. A computationally tractable FWL
closed-loop stability measure is then deﬁned, and the method of computing the value of this measure is given. The
optimal controller realization problem is posed as searching for a ﬂoating-point realization that maximizes the proposed
FWL closed-loop stability measure, and a numerical optimization technique is adopted to solve for the resulting
optimization problem. Simulation results show that the proposed design procedure yields computationally eﬃcient
controller realizations with enhanced FWL closed-loop stability performance.
1. Introduction
The classical digital controller design methodology
often assumes that the controller is implemented exactly,
even though in reality a control law can only be realized
in ﬁnite precision. It may seem that the uncertainty
resulting from ﬁnite-precision implementation of the
digital controller is so small, compared to the uncer-
tainty within the plant, that this controller ‘uncertainty’
can simply be ignored. Increasingly, however, research-
ers have realized that this is not necessarily the case. Due
to the ﬁnite word length (FWL) eﬀect, a casual control-
ler implementation may degrade the designed closed-
loop performance or even destabilize the designed stable
closed-loop system, if the controller implementation
structure is not carefully chosen. The eﬀects of ﬁnite-
precision implementation have become more critical
with the growing popularity of robust controller design
methods which focus only on dealing with large plant
uncertainty (Keel and Bhattacharryya 1997, Istepanian
and Whidborne 2001). Generally speaking, there are
two types of FWL errors in the digital controller. The
ﬁrst one is perturbation of controller parameters imple-
mented with FWL and the second one is the rounding
errors that occur in arithmetic operations of signals.
Typically, eﬀects of these two types of errors are inves-
tigated separately for the reason of mathematical tract-
ability. The ﬁrst type of FWL error directly concerns the
critical issue of closed-loop stability, and many studies
have investigated some closed-loop stability robustness
measures, especially for ﬁxed-point implementation
(Fialho and Georgiou 1994, 2001, Madievski et al.
1995, Li 1998, Chen et al. 1999, Whidborne et al.
2000, 2001, Wu et al. 2001a,b). The second type of
FWL error can also lead to instability through bounded
limit cycles or ﬂoating-point unbounded responses and
how to erase its eﬀect on stability is the focus of the
work of many researchers in control or digital ﬁlter
system designs (Liu and Kaneko 1969, Kaneko 1973,
Miller et al. 1988, 1989, Bauer and Wang 1993,
Bauer 1995). Even when it does not arouse unstable
behaviour, the second type of FWL error can still
degrade the system performance and the eﬀect of this
is usually measured and studied with the so-called
roundoﬀ noise gain (Moroney et al. 1980, Williamson
and Kadiman 1989, Li and Gevers 1990, Liu et al. 1992,
Li et al. 2002).
Most works for FWL controller design adopt an
indirect strategy, which relies on the following property.
A control law can be implemented with diﬀerent realiza-
tions, and these diﬀerent realizations are all equivalent
if they are implemented in inﬁnite precision. However,
diﬀerent controller realizations possess diﬀerent degrees
of robustness to FWL errors. The control law is
assumed to be given by some controller design methods,
which may not take into account FWL considerations,
and the FWL design is to select optimal realizations
for the given control law by optimizing some FWL
criteria. An alternative but better approach is to expli-
citly incorporate the FWL issues into the controller
design process. For example, in the work of Liu et al.
(1992), an FWL–LQG performance index was used to
describe the LQG performance under FWL environ-
ment, and a ﬁxed-order controller realization design
method was presented to minimize this FWL–LQG
International Journal of Control ISSN 0020–7179 print/ISSN 1366–5820 online # 2004 Taylor & Francis Ltd
http://www.tandf.co.uk/journals
DOI: 10.1080/0020717042000208296
INT. J. CONTROL,2 0MARCH 2004, VOL. 77, NO. 5, 427–440
Received 1 August 2003. Revised 30 January 2004.
*Author for correspondence. e-mail: sqc@ecs.soton.ac.uk
yNational Key Laboratory of Industrial Control
Technology, Institute of Advanced Process Control, Zhejiang
University, Hangzhou 310027, P. R. China.
zSchool of Electronics and Computer Science, University
of Southampton, Highﬁeld, Southampton SO171BJ, UK.
}Department of Aerospace Sciences, School of
Engineering, Cranﬁeld University, Bedfordshire MK430AL,
UK.cost function. This direct strategy should be a preferred
approach, since it does not make speciﬁc assumptions
on the controller. However, how to extend the idea of
Liu et al. (1992) to various controller design methods
is still an open problem. But this diﬃculty does not exist
in the indirect strategy where controller synthesis and
controller realization are two separate steps. Various
existing controller design methods can be used to attain
a transfer function or an initial realization of the con-
troller, which can then be optimized to satisfy FWL
implementation requirements.
In real-time applications where computational eﬃ-
ciency is critical, a digital controller implemented with
ﬁxed-pointarithmetichassomeadvantagesoverﬂoating-
point format. However, the detrimental FWL eﬀects are
markedly increased in ﬁxed-point implementation due
to a reduced precision. It is therefore not surprising
that previous works have focused on ﬁnding optimal
controller realizations using ﬁxed-point arithmetic by
optimizing some FWL measures (Li and Gevers 1990,
Liu et al. 1992, Gevers and Li 1993, Fialho and
Georgiou 1994, 2001, Madievski et al. 1995, Li 1998,
Chen et al. 1999, Whidborne et al. 2000, 2001, Wu
et al. 2001a, b, Li et al. 2002). In all the previous
works using ﬁxed-point arithmetic, various measures,
which can be shown to link to the bits required in imple-
menting the fractional part of ﬁxed-point representa-
tion, are optimized to produce optimal realizations.
However, the dynamic range of ﬁxed-point representa-
tion is determined by its integer part. Overﬂow occurs
when there are not enough bits for the integer part.
Optimizing these measures, while minimizing the bits
required for the fractional part, may actually increase
the bits required for the integer part. Arguably, a better
approach would be to consider some measure which has
a direct link to the total bit length required.
With a decrease in price and increase in availability,
the use of ﬂoating-point processors in controller
implementations has increased dramatically. Floating-
point representation has quite diﬀerent characteristics
from ﬁxed-point representation. The dynamic range of
ﬂoating-point representation is determined by its expo-
nent part. Overﬂow or underﬂow occurs when the bits
for the exponent part are not suﬃcient. The eﬀects
of ﬁnite-precision ﬂoating-point implementation have
been well studied in digital ﬁlter designs (Kallioja ¨ rvi
and Astola 1996, Rao 1996, Ralev and Bauer 1999).
However, there has been relatively little work studying
explicitly ﬂoating-point digital controller implemen-
tations. Some exceptions include Rink and Chong
(1979), Molchanov and Bauer (1995) and Whidborne
and Gu (2002). In the work by Istepanian et al.
(2000), a block-ﬂoating-point arithmetic was used, in
which control coeﬃcients were forced to have a common
exponent and the problem was converted into a
ﬁxed-point one. The work by Whidborne and Gu
(2002) represents a case of true ﬂoating-point implemen-
tation. In this work, a weighted closed-loop eigenvalue
sensitivity index was deﬁned for ﬂoating-point digital
controller realizations. This index, however, only con-
siders the mantissa part of ﬂoating-point arithmetic,
under an assumption that the exponent bits are
unlimited.
This paper adopts an indirect approach to consider
the FWL parameter errors of ﬂoating-point implemen-
ted controllers. The generic contribution of this paper
is to derive a new FWL closed-loop stability measure
that explicitly considers both the mantissa and exponent
parts of ﬂoating-point arithmetic. The remainder of this
paper is organized as follows. Section 2 brieﬂy sum-
marizes the ﬂoating-point representation and highlights
the multiplicative nature of perturbations resulting from
FWL ﬂoating-point arithmetic. Section 3 analyses the
FWL eﬀect of ﬂoating-point arithmetic on closed-loop
stability and addresses how to measure such an eﬀect on
ﬂoating-point implemented digital controllers. Section 4
deﬁnes a computationally tractable FWL closed-loop
stability measure for ﬂoating-point controller realiza-
tions and provides the method of computing its value.
In }5, the optimal ﬂoating-point controller realization
problem is formulated, and a numerical optimization
technique is adopted to solve the resulting optimiza-
tion problem. Two examples are given in }6 to demon-
strate the eﬀectiveness of the proposed design method.
Section 7 presents a brief discussion on the direct
approach of Liu et al. (1992) and points out that the
studies on optimizing FWL realizations for a ﬁxed
control law, such as this work, are helpful to explore
the possible way of extending the idea of Liu et al.
(1992). The paper concludes with }8.
2. Floating-point representation
Let the ﬂoor function bxc denote the largest integer
less than or equal to real number x. It is well known that
any real number x 2Rcan be represented uniquely by
x ¼ð   1Þ
s   w   2
e ð1Þ
where s 2f 0,1g is for the sign of x, w 2½ 0:5,1Þ is the
mantissa of x, e ¼b log2 jxjcþ1 2Z is the exponent
of x with Z denoting the set of integers. When x is stored
in a digital computer of ﬁnite   bits in a ﬂoating-point
format, the bits consist of three parts: one bit for s,  w
bits for w and  e bits for e. Obviously,
  ¼ 1 þ  w þ  e: ð2Þ
As the ﬁnite  e bits can only support a limited exponent
range, we deﬁne e and   e e to represent the lower and upper
limits of the exponent range, respectively, and denote
the exponent range that is supported by  e bits as
Z½e,   e e  ¼
4 feje 2Z , e   e     e eg: ð3Þ
428 J. Wu et al.In fact, the exponent range Z½e,   e e  depends not only on
 e but also on the set of real numbers which is to be
represented. As an example, consider the set of three
numbers f0:7   2
 1,   0:9   2,0:8   2
2g. At least two
bits are required to describe their exponents, with 00
representing  1, 01 for 0, 10 representing 1 and 11 for
2. Thus, e ¼  1,   e e ¼ 2 and Z½ 1, 2  ¼f   1,0,1,2g are
determined by the three numbers represented in this ex-
ample of exponent bits  e ¼ 2. Obviously
  e e   e ¼ 2
 e   1: ð4Þ
Overﬂow and underﬂow can occur in ﬂoating-point arith-
metic of FWL. Overﬂow occurs when a ﬂoating-point
scheme with Z½e,   e e  is used to represent a real number
whose exponent is greater than   e e, while underﬂow
occurs when a ﬂoating-point scheme with Z½e,   e e  is used
to represent a real number whose exponent is smaller
than e. It should be emphasized that in many practical
problems, the problem objective function is highly
sensitive to small parameter perturbation and, therefore,
smallnumbersshouldnotsimplybe‘underﬂowed’tozero.
For a demonstration, we refer to the so-called fragility
issue (Keel and Bhattacharryya 1997). In ﬂoating-
point arithmetic with FWL, underﬂow should generally
be treated as seriously as overﬂow, and avoided if
possible.
Since  w and  e are ﬁnite, the set of numbers that is
represented by a particular ﬂoating-point scheme is not
dense on the real line. Thus the set of possible ﬂoating-
point numbers is given by
F¼
4
(
ð 1Þ
s 0:5 þ
X  w
i¼1
bi2
 ðiþ1Þ
 !
2
e:
s 2f 0,1g,bi 2f 0,1g,e 2Z ½e,   e e 
)
[f 0g: ð5Þ
When no underﬂow or overﬂow occurs, that is, the
exponent of x is within Z½e,   e e , the ﬂoating-point quan-
tization operator Q:R!Fcan be deﬁned as
QðxÞ¼
4 sgnðxÞ2
ðe  w 1Þb2
ð w eþ1Þjxjþ0:5c, for x 6¼ 0
0, for x ¼ 0:
(
ð6Þ
In the above deﬁnition, magnitude rounding is used as
the mantissa quantization format. Deﬁne the quantiza-
tion error " as
"¼
4 jx  Q ð xÞj: ð7Þ
Then
" ¼ sgnðxÞjxj sgnðxÞ2
ðe  w 1Þb2
ð w eþ1Þjxjþ0:5c
     
     
¼ 2
ðe  w 1Þ 2
ð w eþ1Þjxj b 2
ð w eþ1Þjxjþ0:5c
     
     
  2
ðe  w 1Þ   0:5: ð8Þ
From the deﬁnition of the exponent e, we have
2
e   0:5 ¼ 2
blog2 jxjc   2
log2 jxj ¼j xj: ð9Þ
Combining (8) and (9) leads to
"  j xj2
 ð wþ1Þ: ð10Þ
Thus, when x is implemented in ﬂoating-point format of
 w mantissa bits, assuming no underﬂow or overﬂow, it
can be seen from (7) and (10) that x is perturbed to
QðxÞ¼xð1 þ  Þ, j j 2
 ð wþ1Þ: ð11Þ
Clearly, the perturbation resulting from ﬁnite-precision
ﬂoating-point arithmetic is multiplicative, unlike the
perturbation resulting from ﬁnite-precision ﬁxed-point
arithmetic, which is additive.
3. Problem statement
Consider the discrete-time closed-loop control sys-
tem, consisting of a linear time invariant plant P(z)
and a digital controller C(z). The plant model P(z)
is assumed to be strictly proper with a state-space
description ðAP,BP,CPÞ, where AP 2R
m m, BP 2R
m l
and CP 2R
q m. Let ðAC,BC,CC,DCÞ be a state-space
description of the controller C(z), with AC 2R
n n,
BC 2R
n q, CC 2R
l n and DC 2R
l q. A linear system
with a given transfer function matrix has an inﬁnite
number of state-space descriptions. In fact, if ðA
0
C,
B
0
C,C
0
C,D
0
CÞ is a state-space description of C(z), all the
state-space descriptions of C(z) form a realization set
SC ¼
4 ðAC,BC,CC,DCÞjAC ¼ T
 1A
0
CT,
 
BC ¼ T
 1B
0
C,CC ¼ C
0
CT, DC ¼ D
0
C
 
ð12Þ
where the transformation matrix T 2R
n n is an arbi-
trary non-singular matrix. Denote
X ¼½ xj,k ¼
4 DC CC
BC AC
"#
: ð13Þ
The stability of the closed-loop control system depends
on the eigenvalues of the closed-loop transition matrix
AðXÞ¼
AP þ BPDCCP BPCC
BCCP AC
"#
¼
AP 0
00
"#
þ
BP 0
0I n
"#
X
CP 0
0I n
"#
¼
4 M0 þ M1XM2 ð14Þ
where 0 denotes the zero matrix of appropriate dimen-
sion and In the n   n identity matrix. All the diﬀerent
realizations X in SC have exactly the same set of closed-
loop poles if they are implemented with inﬁnite preci-
sion. Since the closed-loop system has been designed to
be stable, all the eigenvalues  iðAðXÞÞ,1   i   m þ n,
are within the unit disk. Deﬁne
kXkmax ¼
4 max
j,k
jxj,kjð 15Þ
Floating-point implemented digital controllers 429and
gðXÞ¼
4 min
j,k
fjxj, kj:xj, k 6¼ 0g: ð16Þ
The controller X is implemented with a ﬂoating-point
processor of  e exponent bits,  w mantissa bits and one
sign bit.
First, in order to avoid underﬂow and/or overﬂow,
both the exponent of kXkmax and the exponent of gðXÞ
should be within Z½e,   e e  supported by the  e exponent
bits. We deﬁne an exponent measure for the ﬂoating-
point controller realization X as
 ðXÞ¼
4 log2
4kXkmax
gðXÞ
  
: ð17Þ
The rationale of this exponent measure becomes clear in
the following (obvious) proposition.
Proposition 1: X can be represented in the ﬂoating-
point format of  e exponent bits without underﬂow or
overﬂow, if
2
 e   log2
kXkmax
gðXÞ
  
þ 2:
Let  
min
e be the smallest exponent bit length that,
when used to implement X, can avoid underﬂow and
overﬂow. It can be computed as
 
min
e ¼   log2ðblog2 kXkmaxc b log2 gðXÞc þ 1Þ
  
:
ð18Þ
The measure  ðXÞ provides an estimate of  
min
e as
^    
min
e ¼
4  b  log2  ðXÞc: ð19Þ
It is clear that ^    
min
e    
min
e .
Second, when there is no underﬂow or overﬂow,
according to (11), X is perturbed to X þ X " due to
the eﬀect of ﬁnite  w where
X "¼
4 ½xj,k j,k ð 20Þ
represents the Hadamard product of X and " ¼½  j,k .
Each element of " is bounded by  2
 ð wþ1Þ, that is
k"kmax   2
 ð wþ1Þ: ð21Þ
With the perturbation ",  i ðAðXÞÞ is moved to
 i ðAðX þ X "ÞÞ. If an eigenvalue of AðX þ X "Þ is
outside the open unit disk, the closed-loop system,
designed to be stable, becomes unstable with the ﬁnite-
precision ﬂoating-point implemented X.
It is therefore critical to know when the FWL error
will cause closed-loop instability. This means that we
would like to know the largest open ‘hypercube’ in the
perturbation space, within which the closed-loop system
remains stable. Based on this consideration, a mantissa
measure for the ﬂoating-point realization X can be
deﬁned as
 0ðXÞ¼
4 inf fk"kmax: AðX þ X "Þ is unstableg: ð22Þ
From the above deﬁnition, the following proposition is
obvious.
Proposition 2: AðX þ X "Þ is stable if k"kmax <
 0ðXÞ.
Let  
min
w be the mantissa bit length such that
8 w    
min
w , AðX þ X "Þ is stable for the ﬂoating-
point implemented X with  w mantissa bits and
AðX þ X "Þ is unstable for the ﬂoating-point imple-
mented X with  
min
w   1 mantissa bits. Except through
simulation,  
min
w is generally unknown. It should be
pointed out that due to the complex non-linear relation-
ship between  w and closed-loop stability, there may
exist some odd cases of smaller mantissa bit length
 w < 
min
w   1 which regain closed-loop stability. For
example, consider the stable closed-loop system contain-
ing the plant
PðzÞ¼
 1:66ðz   1:2Þðz   1:1Þðz   0:4Þ
ðz2   0:35z þ 0:49Þðz þ 4Þ
and the controller CðzÞ¼K ¼ 0:66. When  w   4, the
closed-loop system with the FWL implemented K is
stable, but it becomes unstable with  w ¼ 3 where the
implemented value of K is 0:6875. However, the closed-
loop regains stability with  w ¼ 2 where the imple-
mented value of K is 0:625. The system becomes unstable
again for  w ¼ 1 where the implemented value of K is
0:75. Figure 1 shows the root locus plot of this three-
order system which gives the closed-loop pole positions
for all values of K. From ﬁgure 1, it can be seen that the
system is unstable when the implemented value of K is
greater than 0:686 or less than 0:513. For this system,
 
min
w is 4 rather than 2. The mantissa measure  0ðXÞ
provides an estimate of  
min
w as
^    
min
w0 ¼
4  b log2  0ðXÞc   1: ð23Þ
It can be seen that ^    
min
w0    
min
w .
Deﬁne the minimum total bit length required in
ﬂoating-point implementation as
 
min ¼
4  
min
e þ  
min
w þ 1: ð24Þ
Clearly, a ﬂoating-point implemented X with a bit
length      
min can guarantee no underﬂow, no over-
ﬂow and closed-loop stability. Combining the measures
 ðXÞ and  0ðXÞ results in the following true FWL
closed-loop stability measure for the ﬂoating-point
realization X
 0ðXÞ¼
4  0ðXÞ= ðXÞ: ð25Þ
An estimate of  
min is given by  0ðXÞ as
^    
min
0 ¼
4  b log2  0ðXÞc þ 1: ð26Þ
430 J. Wu et al.It is clear that ^    
min
0    
min. The following proposition
summarizes the usefulness of  0ðXÞ as a measure for the
FWL characteristics of X.
Proposition 3: A ﬂoating-point implemented X with a
bit length   can guarantee no underﬂow, no overﬂow
and closed-loop stability, if
2
  1  
1
 0ðXÞ
: ð27Þ
Since the closed-loop stability measure  0ðXÞ is
a function of the controller realization X and ^    
min
0
decreases with the increase of  0ðXÞ, an optimal realiza-
tion can in theory be found by maximizing  0ðXÞ, lead-
ing to the optimal controller realization problem
 true ¼
4 max
X2SC
 0ðXÞ: ð28Þ
However, the diﬃculty with this approach is that com-
puting the value of  0ðXÞ is an unsolved open problem.
Thus, the true FWL closed-loop stability measure  0ðXÞ
and the optimal realization problem (28) have limited
practical signiﬁcance. In the next section, we will
seek an alternative measure that can not only quantify
FWL characteristics of X but is also computationally
tractable.
4. A tractable FWL closed-loop stability measure
When the FWL error " is small, from a ﬁrst-order
approximation, 8i 2f 1,...,m þ ng
 iðAðX þ X "ÞÞ
           iðAðXÞÞ
         
X lþn
j¼1
X qþn
k¼1
@j ij
@ j,k
         
"¼0
 j,k:
ð29Þ
For the derivative matrix @j ij=@" ¼½ @j ij=@ j,k , deﬁne
@j ij
@"
       
       
sum
¼
4 X
j,k
@j ij
@  j,k
         
         
: ð30Þ
Then
 iðAðX þ X "ÞÞ
           iðAðXÞÞ
         k "kmax
@j ij
@"
       
"¼0
       
       
sum
:
ð31Þ
This leads to the following mantissa measure for the
ﬂoating-point realization X
 1ðXÞ¼
4 min
i2f1,...,mþng
1    iðAðXÞÞ
       
@j ij=@"
   
"¼0
     
     
sum
: ð32Þ
For those FWL errors that make (31) hold, if
k"kmax <  1ðXÞ, then j iðAðX þ X "ÞÞj < 1 which
means that the closed-loop remains stable under the
FWL error ". In other words, the closed-loop can
tolerate those FWL perturbations " whose norms
k"kmax are less than  1ðXÞ. The larger  1ðXÞ is, the
larger FWL errors the closed-loop system can tolerate.
Similar to (23), from the mantissa measure  1ðXÞ,a n
estimate of  
min
w is given as
^    
min
w1 ¼
4  b log2  1ðXÞc   1: ð33Þ
The assumption of small " is usually valid in ﬂoating-
point implementation. Generally speaking, there is
no rigorous relationship between  0ðXÞ and  1ðXÞ,
but  1ðXÞ is connected with a lower bound of
 0ðX) in some ways: there are ‘stable perturbation
hypercubes’ larger than f":k"kmax< 1ðXÞg while
there is no ‘stable perturbation hypercube’ larger than
f":k"kmax <  0ðXÞg (Wu et al. 2000, 2001a). Hence, in
most cases, it is reasonable to take that  1ðXÞ  0ðXÞ
and ^    
min
w1   ^    
min
w0 . More importantly, unlike the measure
−4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1
−1  
−0.5
0   
0.5
1
 K=0.686
 K=0.513
Figure 1. Root locus plot of a three-order system.
Floating-point implemented digital controllers 431 0ðXÞ, the value of  1ðXÞ can be computed explicitly.
It is easy to see that
@j ij
@"
       
"¼0
¼
@j ij
@X
  X: ð34Þ
Let pi be a right eigenvector of AðXÞ corresponding to
the eigenvalue  i. Deﬁne
Mp ¼
4 p1 p2     pmþn
  
ð35Þ
and
My ¼
4 y1 y2     ymþn
  
¼ M
 H
p ð36Þ
where the superscript H denotes the conjugate transpose
operator and yi is called the reciprocal left eigenvector
related to pi. The following lemma is due to Li (1998).
Lemma 1: Let AðXÞ¼M0 þ M1XM2 given in (14) be
diagonalizable. Then
@ i
@X
¼ M
T
1y
 
i p
T
i M
T
2 ð37Þ
where the superscript   denotes the conjugate operation
and T the transpose operator.
Comments: The necessary and suﬃcient condition for
AðXÞ being diagonalizable is that it has m þ n linearly
independent eigenvectors. This includes two cases.
Firstly, AðXÞ has m þ n distinct eigenvalues. In this
case, we can diﬀerentiate eigenvalues simply by their
values. Secondly, the eigenvalues of AðXÞ are not all
distinct but there are m þ n linearly independent eigen-
vectors. In this case, we can diﬀerentiate eigenvalues by
their corresponding eigenvectors.
The following proposition shows that, given an X,
the value of  1ðXÞ can easily be calculated.
Proposition 4: Let AðXÞ be diagonalizable. Then
 1ðXÞ¼ min
i2f1,...,mþng
j ijð1 j ijÞ
MT
1Re½  
i y 
i pT
i  MT
2
  
 X
       
sum
: ð38Þ
Proof: Noting j ij¼
ﬃﬃﬃﬃﬃﬃﬃﬃﬃ
  
i  i
p
leads to
@j ij
@X
¼
1
2
ﬃﬃﬃﬃﬃﬃﬃﬃﬃ
  
i  i
p
@ 
 
i
@X
 i þ 
 
i
@ i
@X
  
¼
1
2j ij
@ i
@X
    
 i þ 
 
i
@ i
@X
  
¼
1
j ij
Re  
 
i
@ i
@X
  
: ð39Þ
Combining (32), (34), (39) and Lemma 1 results in this
proposition. œ
Replacing  0ðXÞ with  1ðXÞ in (25) leads to a
computationally tractable FWL closed-loop stability
measure
 1ðXÞ¼
4  1ðXÞ= ðXÞ: ð40Þ
From the above measure, an estimate of  
min is given as
^    
min
1 ¼
4  b log2  1ðXÞc þ 1: ð41Þ
Note that the computationally tractable mantissa
measure (32) is related to the eigenvalue module sensi-
tivities with respect to (w.r.t.) the controller perturba-
tion. This is similar to the case of controller realizations
implemented in ﬁxed-point arithmetic, where an existing
FWL precision measure is deﬁned as (Wu et al. 2001a)
 fðXÞ¼
4 min
i2f1,...,mþng
1  j  iðAðXÞÞj
k@j ij=@Xksum
: ð42Þ
The idea underpinning  1ðXÞ in (32), namely the sensi-
tivity w.r.t. controller perturbation, is the same as the
sensitivity w.r.t. controller parameters that underpins
 fðXÞ in (42). In fact, it is well known that with an
FWL ﬁxed-point implementation, X is perturbed to
X þ " and
j i ðAðX þ "ÞÞj   j i ðAðXÞÞj  
X lþn
j¼1
X qþn
k¼1
@j ij
@ j,k
         
"¼0
 j,k:
ð43Þ
Obviously, in the ﬁxed-point case, we have
@j ij
@"
       
"¼0
¼
@j ij
@X
ð44Þ
andtheﬁxed-pointFWLmeasure fðXÞcanbewrittenas
 fðXÞ¼ min
i2f1,...,mþng
1  j  iðAðXÞÞj
k@j ij=@"j"¼0ksum
: ð45Þ
On the other hand, from (32) and (34), it can be seen that
 1ðXÞ¼ min
i2f1,...,mþng
1  j  iðAðXÞÞj
ð@j ij=@XÞ X
       
sum
ð46Þ
which is clearly linked to the eigenvalue module sensi-
tivities w.r.t. the controller parameters. The Hadamard
product in (46) merely reﬂects the multiplicative charac-
teristic of ﬂoating-point perturbations.
It is also useful to compare the proposed measure
with the previous results for ﬂoating-point format, espe-
cially the most recent one given by Whidborne and Gu
(2002). For a complex-valued matrix Y ¼½yj,k , deﬁne
the Frobenius norm
kYkF ¼
4 X
j,k
y
 
j,k yj,k
 ! 1=2
: ð47Þ
Under an assumption that the exponent bits are unlim-
ited, the computationally tractable weighted closed-loop
eigenvalue sensitivity index addressed in Whidborne and
Gu (2002) is given by
UðXÞ¼
4 X mþn
i¼1
 iUiðXÞð 48Þ
432 J. Wu et al.where  i are non-negative weighting scalars and UiðXÞ
are single-eigenvalue sensitivities deﬁned by
UiðXÞ¼
4 kXk
2
F
@ i
@X
       
       
2
F
: ð49Þ
The thinking behind the above deﬁnition is as
follows. From a ﬁrst-order approximation, it can easily
be shown that
j iðAðX þ X "ÞÞ    iðAðXÞÞj   k"kmax kXkF
@ i
@X
       
       
F
:
ð50Þ
Therefore, for those multiplicative perturbations
bounded by k"kmax, a small UiðXÞ will limit the result-
ing change of the corresponding eigenvalue within a
small range.
The ﬁrst obvious observation is that  1ðXÞ considers
both the mantissa and exponent of ﬂoating-point arith-
metic and is therefore able to handle all the aspects of
underﬂow, overﬂow and closed-loop stability, while
UðXÞ only considers the mantissa part of ﬂoating-point
arithmetic and is thus ‘incomplete’. Secondly, it can
be seen that UðXÞ deals with the sensitivity of  i while
 1ðXÞ ( 1ðXÞ) considers the sensitivity of j ij. It is well
known that the stability of a discrete-time linear time-
invariant system depends only on the moduli
of its eigenvalues. As UðXÞ includes the unnecessary
eigenvalue arguments in consideration, it is generally
conservative in comparison with  1ðXÞ. Third,  1ðXÞ
uses kð@j ij=@XÞ Xksum while UðXÞ uses kXkFk@ i=@XkF
in checking the change of an eigenvalue. It is easy to
see that
j iðAðX þ X   "ÞÞj   j iðAðXÞÞj   k"kmax
@j ij
@X
  X
       
       
sum
 k "kmaxkXkF
@ i
@X
       
       
F
:
ð51Þ
Obviously, kð@j ij=@XÞ Xksum gives a more accurate
limit than kXkFk@ i=@XkF does on the change of the
corresponding eigenvalue module due to the multiplica-
tive perturbations. This again implies that  1ðXÞ is less
conservative than UðXÞ in estimating the robustness of
closed-loop stability with respect to controller perturba-
tions. The fourth observation is that  1ðXÞ provides an
estimate of  
min, ^    
min
1 in (41), while UðXÞ cannot provide
information on bit length to the designer. One reason is
that the measure  1ðXÞ consists of two components, with
 1ðXÞ addressing the stability margin and eigenvalue
sensitivity linked to the mantissa bits and  ðXÞ consider-
ing the exponent bits, while UðXÞ only focuses on the
eigenvalue sensitivity partially linked to the mantissa
part. The other reason is that, over all the closed-loop
eigenvalues,  1ðXÞ considers the ‘worst’ one while UðXÞ
considers a ‘weighted average’.
Finally, it is worth emphasizing that the generic idea
of considering both the exponent, which deﬁnes the
dynamic range, and mantissa, which deﬁnes the accu-
racy or precision, of the ﬂoating-point arithmetic is a
sensible one and should be extended to other situations
where diﬀerent representation schemes, such as ﬁxed-
point format, are used.
5. Optimization procedure
As diﬀerent realizations X have diﬀerent values of
the FWL closed-loop stability measure  1ðXÞ,i ti so f
practical importance to ﬁnd an ‘optimal’ realization
Xopt that maximizes  1ðXÞ. The controller implemented
with this optimal realization Xopt needs a minimum bit
length and has a maximum tolerance to the FWL error.
This optimal controller realization problem is formally
deﬁned as
 ¼
4 max
X2SC
 1ðXÞ: ð52Þ
Assume that a controller has been designed using some
standard controller design method. This controller,
denoted as
X0 ¼
4 D
0
C C
0
C
B
0
C A
0
C
"#
ð53Þ
is used as the initial controller realization in the above
optimal controller realization problem. Let p0i be a right
eigenvector of AðX0Þ corresponding to the eigenvalue  i
and y0i be the reciprocal left eigenvector related to p0i.
The deﬁnition of SC in (12) means that
X¼
4 XðTÞ¼
Il 0
0T
 1
"#
X0
Iq 0
0T
"#
ð54Þ
where detT 6¼ 0. It can then be shown that
AðXÞ¼
Im 0
0T
 1
"#
AðX0Þ
Im 0
0T
"#
ð55Þ
which implies that
pi ¼
Im 0
0T
 1
"#
p0i, yi ¼
Im 0
0T
T
"#
y0i: ð56Þ
Hence
M
T
1Re½ 
 
i y
 
i p
T
i  M
T
2
¼
Il 0
0T
T
"#
M
T
1Re½ 
 
i y
 
0i p
T
0i M
T
2
Iq 0
0T
 T
"#
¼
4 Il 0
0T
T
"#
(i
Iq 0
0T
 T
"#
ð57Þ
Floating-point implemented digital controllers 433with (i ¼ M
T
1Re½ 
 
i y
 
0i p
T
0i M
T
2. Deﬁne the cost function
fðTÞ¼
4 max
i2f1,...,mþng
Il 0
0T
T
"#
(i
Iq 0
0T
 T
"#  !
  XðTÞ
         
         
sum
j ijð1  j  ijÞ
0
B B @
  log2
4kXðTÞkmax
gðXðTÞÞ
1
C C A
: ð58Þ
In the above deﬁnition of the cost function fðTÞ
log2
4kXðTÞkmax
gðXðTÞÞ
is simply  ðXÞ which estimates the cost of exponent bits,
while
max
i2f1,...,mþng
Il 0
0T
T
"#
(i
Iq 0
0T
 T
"#  !
  XðTÞ
         
         
sum
j ijð1  j  ijÞ
is the inverse of  1ðXÞ which estimates the cost of
mantissa bits. Hence fðTÞ can be used to measure the
cost of total bits.
With the introduction of this cost function, the
optimal controller realization problem (52) can then be
posed as the optimization problem
 
 1 ¼ min
T2Rn n
detT6¼0
fðTÞ: ð59Þ
As the optimization problem (59) is highly non-linear,
global optimization algorithms, such as the genetic
algorithm (Man et al. 1998) and adaptive simulated
annealing (Chen and Luk 1999), can be adopted to
provide a (sub)optimal similarity transformation Topt.
Global optimization methods are, however, computa-
tionally demanding. Local optimization algorithms,
such as Rosenbrock and Simplex algorithms (Beveridge
and Schechter 1970), are computationally simpler but
run more risks of only attaining a local solution. Our
experience with the optimization problem (59) suggests
that, unlike optimizing the mantissa measure  1ðXÞ
alone, the exponent measure  ðXÞ in the criterion  1ðXÞ
helps to bound the solution set and the cost function
fðTÞ appears to behave better. It is also helpful to use
some good initial controller realization, such as the open-
loop balanced realization (Laub et al. 1987), as the initial
guess for the optimization routine. With Topt, the opti-
mal realization Xopt can readily be computed.
6. Numerical examples
Two examples are used to illustrate the proposed
design procedure for obtaining optimal FWL ﬂoating-
point controller realizations and to compare it with the
method given in Whidborne and Gu (2002). In the simu-
lation, the bit length for implementing the state variables
was suﬃciently long that the second type of FWL error
can be neglected.
Example 1: This example, taken from Gevers and Li
(1993), has been studied by Whidborne and Gu (2002).
The discrete-time plant is given by
AP ¼
3:7156e þ 0  5:4143e þ 03 :6525e þ 0  9:6420e   1
1000
0100
0010
2
6 6 6 6 6 4
3
7 7 7 7 7 5
BP ¼ 1000
   T
CP ¼ 1:1160e   64 :3000e   81 :0880e   61 :4000e   8
  
:
The initial realization of the digital controller is given by
A
0
C ¼
2:6743e þ 0  5:7446e þ 02 :5101e þ 0  9:1782e   1
2:8769e   1  2:7446e   2  6:9444e   1  8:9358e   3
 3:3773e   19 :8699e   1  3:2925e   1  4:2367e   3
 8:3021e   2  3:1988e   39 :1906e   1  1:0415e   3
2
6 6 6 6 6 4
3
7 7 7 7 7 5
B
0
C ¼ 1:0959e þ 66 :3827e þ 53 :0262e þ 57 :4392e þ 4
   T
C
0
C ¼ 1:8180e   1  2:8313e   15 :0006e   26 :1722e   2
  
,
D
0
C ¼ 0:
Based on the proposed FWL closed-loop stability mea-
sure, the optimization problem (59) was formed and
solved using the MATLAB routine fminsearch.m,
which is a local optimization search algorithm, to obtain
an optimal transformation matrix
Topt ¼
7:7275eþ3  1:0904eþ2  2:1292eþ29 :8042eþ1
6:9729eþ32 :1370eþ34 :4988eþ12 :1812eþ2
6:2844eþ33 :9092eþ32 :9303eþ22 :9240eþ2
5:5879eþ35 :2862eþ35 :5027eþ23 :4367eþ2
2
6 6 6 6 4
3
7 7 7 7 5
and the corresponding optimal realization of the digital
controller Xopt given by
A
opt
C ¼
 1:4441e þ0  1:0500e þ0  6:0800e 2  1:0102e 1
3:8412e þ02 :4034e þ06 :7143e 21 :7402e 1
 1:3159e þ1  4:5856e þ05 :3403e 1  6:8843e 1
3:2330e  1  2:1078e þ0  6:6254e 28 :2322e 1
2
6 6 6 6 6 4
3
7 7 7 7 7 5
B
opt
C ¼ 1:6342eþ2  2:5378e þ29 :1370eþ 2  6:1106e 2
   T
C
opt
C ¼ 8:9770eþ1  1:0310e þ2  2:8290eþ 0  8:0995eþ0
  
,
D
opt
C ¼ 0:
An ‘optimal’ controller realization problem was
given in Whidborne and Gu (2002) based on the
weighted closed-loop eigenvalue sensitivity index (48).
We will use the index ‘s’, rather then ‘opt’, to denote
434 J. Wu et al.the solution of this ‘optimal’ controller realization prob-
lem. For this example, the transformation matrix sol-
ution obtained using the MATLAB routine
fminsearch.m given in Whidborne and Gu (2002) is
Ts¼
8:1477eþ30 0 0
7:0104eþ32 :2671eþ30 0
6:1991eþ33 :9989eþ31 :1558eþ20
5:6761eþ35 :2680eþ33 :5814eþ21 :5299eþ1
2
6 6 6 6 6 4
3
7 7 7 7 7 5
with the corresponding controller realization Xs given by
A
s
C ¼
 9:9795e 1  9:5988e 1  4:7357e 3  1:7234e 3
2:1137eþ01 :6951eþ0  2:2171e 25 :2689e 3
 1:4177eþ06 :1144e 16 :7870e 1  9:0420e 2
1:9428eþ0  2:4577eþ04 :2234e 19 :4079e 1
2
6 6 6 6 6 4
3
7 7 7 7 7 5
B
s
C ¼ 1:3451eþ2  1:3439eþ25 :3833eþ1  2:5633eþ1
   T
C
s
C ¼ 1:5673eþ2  1:1677eþ22 :7885eþ19 :4430e 1
  
,
D
s
C ¼0:
It is obvious that the true minimum exponent bit
length  
min
e for a realization X can directly be obtained
by examining the elements of X. The true minimum
mantissa bit length  
min
w , however, can only be obtained
through simulation. That is, starting from a very large
 w, reduce  w by one bit and check the closed-loop
stability. The process is repeated until there appears
closed-loop instability at  w ¼  wu. Then  
min
w ¼
 wu þ 1. Table 1 summarizes the various measures, the
corresponding estimated minimum bit lengths and the
true minimum bit lengths for the three controller reali-
zations X0, Xs and Xopt, respectively. It can be seen
that the ﬂoating-point implementation of X0 needs
at least 26 bits (20 mantissa bits and ﬁve exponent
bits) while the implementation of Xopt needs at least 13
bits (eight mantissa bits and four exponent bits). The
reduction in the bit length required is 13 (12-bit reduc-
tion for the mantissa part and 1-bit reduction for the
exponent part). Comparing Xopt with Xs, it can be seen
that Xopt needs one bit less in the exponent part and one
bit less in the mantissa part.
Note that any realization X 2S C implemented in
inﬁnite precision will achieve the exact performance
of the inﬁnite-precision implemented X0, which is the
designed controller performance. For this reason, the
inﬁnite-precision implemented X0 is referred to as
the ideal controller realization Xideal. Figure 2 compares
the unit impulse response of the plant output y(k) for the
ideal controller Xideal with those of the 8-mantissa-bit
plus 5-exponent-bit implemented Xs and 8-mantissa-bit
plus 4-exponent-bit implemented Xopt. The 8-mantissa-
bit implemented X0 quickly becomes unstable and is not
shown here. From ﬁgure 2, it can be seen that the closed-
loop system with the 13-bit implemented Xopt is stable
while the system with the 14-bit implemented Xs is
unstable. Figure 3 compares the unit impulse response
of y(k) for Xideal with those of the 9-mantissa-bit plus
5-exponent-bit implemented Xs and the 9-mantissa-
bit plus 4-exponent-bit implemented Xopt. Again the
9-mantissa-bit implemented X0 is unstable and is not
shown. It can be seen that the response with the 14-bit
implemented Xopt is clearly closer to the ideal perfor-
mance than that of the 15-bit implemented Xs.
Example 2: This example is taken from a continuous-
time H1 robust control example studied in Keel and
Bhattacharryya (1997) and Whidborne et al. (2001).
The continuous-time plant model and H1 controller
are sampled with a sampling period of 4ms to obtain
the discrete-time plant
AP ¼
1:9980e þ 0  9:9800e   1
10
"#
BP ¼ 10
   T, CP ¼ 3:9880e   3  4:0040e   3
  
and the initial realization of the digital controller
A
0
C ¼
2:3985e þ 0  1:8017e þ 04 :0317e   1
100
010
2
6 6 4
3
7 7 5
B
0
C ¼ 100
   T
C
0
C ¼  7:3591e þ 11 :4661e þ 2  7:3018e þ 1
  
,
D
0
C ¼ 1:2450e þ 2:
The MATLAB routine fminsearch.m was used to
solve the optimization problem based on the FWL
Realization  1 ^    
min
1  1 ^    
min
w1   ^    
min
e  
min  
min
w  
min
e
X0 2.6644e 9 30 8.5182e 8 23 3.1971eþ1 5 26 20 5
Xs 4.7588e 6 19 8.7907e 5 13 1.8473eþ15 1 5 9 5
Xopt 9.5931e 6 18 1.5229e 4 12 1.5875eþ14 1 3 8 4
Table 1. Various measures, corresponding estimated minimum bit lengths and true minimum bit lengths for
three controller realizations X0, Xs and Xopt of Example 1.
Floating-point implemented digital controllers 4350 500 1000 1500
−4
−2
0
2
4
6
8
10
x 10
−3
k
y
(
k
)
X
ideal
X
s
X
opt
Figure 2. Unit impulse response y(k) for Xideal, 14-bit implemented Xs (eight mantissa bits and ﬁve exponent bits) and 13-bit
implemented Xopt (eight mantissa bits and four exponent bits) of Example 1.
0 500 1000 1500
−6
−4
−2
0
2
4
6
8
10
x 10
−3
k
y
(
k
)
X
ideal
X
s
X
opt
Figure 3. Unit impulse response y(k) for Xideal, 15-bit implemented Xs (nine mantissa bits and ﬁve exponent bits) and 14-bit
implemented Xopt (nine mantissa bits and four exponent bits) of Example 1.
436 J. Wu et al.closed-loop stability measure presented in this paper to
obtain an optimal transformation matrix
Topt ¼
1:8515e þ 27 :2829e   19 :7266e þ 0
1:8540e þ 21 :6951e þ 1  2:3477e þ 0
1:8566e þ 23 :3300e þ 1  1:4508e þ 1
2
6 4
3
7 5
and the corresponding optimal realization of the digital
controller Xopt with
A
opt
C ¼
1:0006e þ 0  8:8718e   29 :9092e   2
 2:7168e   21 :0178e þ 0  4:5738e   1
 3:6546e   23 :2513e   23 :8007e   1
2
6 6 4
3
7 7 5
B
opt
C ¼  6:8999e þ 09 :2711e þ 11 :2450e þ 2
   T
C
opt
C ¼  3:6469e   22 :7168e   2  6:1334e   1
  
,
D
opt
C ¼ 1:2450e þ 2:
Based on the method of the weighted closed-loop eigen-
value sensitivity index (Whidborne and Gu 2002), the
MATLAB routine fminsearch.m found a transformation
matrix solution
Ts ¼
1:8446e þ 20 0
1:8500e þ 22 :9433e þ 00
1:8553e þ 25 :9061e þ 08 :3753e   3
2
6 4
3
7 5
with the corresponding controller realization Xs given by
A
s
C ¼
9:9711e   1  1:5840e   21 :8305e   5
3:2077e   59 :9558e   1  1:1505e   3
 2:8762e   22 :5216e   14 :0584e   1
2
6 4
3
7 5
B
s
C ¼ 5:4211e   3  3:4074e   11 :2019e þ 2
   T
C
s
C ¼  2:9785e   22 :6087e   1  6:1154e   1
  
,
D
s
C ¼ 1:2450e þ 02:
Table 2 summarizes the various measures, the corre-
sponding estimated minimum bit lengths and the true
minimum bit lengths for X0, Xs and Xopt. Obviously,
the implementation of X0 needs at least 30 bits
(25 mantissa bits and four exponent bits) while the
implementation of Xopt requires at least 12 bits
(seven mantissa bits and four exponent bits). It can be
seen that the optimization results in a reduction of 18
bits for the mantissa part. It is interesting to note that
the realization Xs, while reducing 16 bits in the required
 
min
w , actually increases the required  
min
e by one bit,
compared with X0. This is not surprising, since the
measure UðXÞ completely neglects the exponent part.
Figure 4 compares the unit impulse response of the
plant output y(k) for the ideal controller Xideal with
those of the 14-bit implemented Xs (eight mantissa bits
and ﬁve exponent bits) and the 14-bit implemented Xopt
(nine mantissa bits and four exponent bits). It can be
seen that the closed-loop system with the 14-bit imple-
mented Xopt is stable while the system with the 14-bit
implemented Xs is unstable. Figure 5 compares the unit
impulse response of y(k) for Xideal with those of the 15-
bit implemented Xs (nine mantissa bits and ﬁve expo-
nent bits) and the 15-bit implemented Xopt (ten mantissa
bits and four exponent bits). The performance of the 15-
bit implemented Xopt is clearly closer to the ideal per-
formance than that of the 15-bit implemented Xs.
7. Brief discussion on the direct approach
A limitation of the indirect strategy, one may argue,
is that it relies on a ﬁxed control law or transfer func-
tion. The direct approach removes this assumption and
appears to be a better approach in dealing with the
FWL issues. Apart from the excellent work by Liu et al.
(1992), we are only aware of another case of successfully
adopting a direct strategy (Yang et al. 2000), where the
standard H1 control design was extended to include
FWL controller parameter perturbations, and a
Riccati inequality approach was developed to directly
obtain optimal controller realizations satisfying both
the H1 robustness and FWL closed-loop stability
requirements. Except for H1 and LQG, it seems to be
very diﬃcult to extend various controller design
methods to this direct strategy. The indirect approach,
however, is very ﬂexible. Controller synthesis is gener-
ally a highly complicated task, involving many trade-oﬀs
for various conﬂicting requirements. Even when a direct
method can be found, the indirect approach is still use-
ful, as it can be used to further optimize a controller
realization obtained with the direct approach.
To see where the diﬃculties are for the direct
approach, let us discuss how to extend the work of
Realization  1 ^    
min
1  1 ^    
min
w1   ^    
min
e  
min  
min
w  
min
e
X0 2.6767e 11 37 2.8122e 10 31 1.0506eþ1 4 30 25 4
Xs 3.1047e 6 20 7.6679e 5 13 2.4697eþ15 1 5 9 5
Xopt 5.8446e 6 19 8.2771e 5 13 1.4162eþ14 1 2 7 4
Table 2. Various measures, corresponding estimated minimum bit lengths and true minimum bit lengths for
three controller realizations X0, Xs and Xopt of Example 2.
Floating-point implemented digital controllers 4370 5000 10000 15000
−0.015
−0.01
−0.005
0
0.005
0.01
0.015
 k
 
y
(
k
)
 X
ideal
 X
s      
 X
opt  
Figure 4. Unit impulse response y(k) for Xideal, 14-bit implemented Xs (eight mantissa bits and ﬁve exponent bits) and 14-bit
implemented Xopt (nine mantissa bits and four exponent bits) of Example 2.
0 5000 10000 15000
−5
0
5
10
15
x 10
−3
k
y
(
k
)
X
ideal
X
s
X
opt
Figure 5. Unit impulse response y(k) for Xideal, 15-bit implemented Xs (nine mantissa bits and ﬁve exponent bits) and 15-bit
implemented Xopt (ten mantissa bits and four exponent bits) of Example 2.
438 J. Wu et al.Liu et al. (1992) to the generic setting. First deﬁne the
controller realization set
UC ¼
4 fXjX 2R
ðlþnÞ ðqþnÞ,
X is a controller realization stabilizing PðzÞg:
ð60Þ
Assume that a performance index can be formulated to
reﬂect the needs of all the performance requirements,
including FWL implementation considerations. Extend-
ing the idea of Liu et al. (1992) to this generic setting,
the optimization problemy for FWL controller realiza-
tion design can be deﬁned as
 ¼
4 min
X0 2UC
min
T2Rn n
detT6¼0
JðX0,TÞ: ð61Þ
The cost function
JðX0,TÞ¼
4 lim
k!1
E y
TðkÞQyðkÞþu
TðkÞRuðkÞ
  
ð62Þ
depends on X0 and T, where E½   represents the average
value, yðkÞ is the output of P(z), uðkÞ is the output of
C(z), Q and R are given matrices. It is easy to see that
the problem (61) can be broken into two parts and
solved with the two coupling optimization problems:
 ðX0Þ¼
4 min
T2Rn n
detT6¼0
JðX0,TÞð 63Þ
  ¼ min
X0 2UC
 ðX0Þ: ð64Þ
Providing that the optimization problem (63) can be
solved exactly, for example, some close-form solution
of the problem (63) can be obtained, the optimization
problem (64) can be tackled and hopefully solved suc-
cessfully. Apart from a few performance cost functions,
how to solve the generic optimization problem (61) is
still an open problem. It is also clear that the ﬁrst part
(63) of the optimization problem (61) has the same form
as our optimization problem (59). Thus, the studies on
optimal realizations for a ﬁxed control law, like the one
in this paper, may provide useful insights to help solve
the more generic optimal realization problem (61).
8. Conclusions
The closed-loop stability issue of ﬁnite-precision
realizations has been investigated for digital controller
implemented in ﬂoating-point arithmetic. A new compu-
tationally tractable FWL closed-loop stability measure
has been derived for ﬂoating-point controller realiza-
tions. Unlike the existing methods, which only consider
the mantissa part of the ﬂoating-point scheme, the pro-
posed measure takes into account both the exponent
and mantissa parts of the ﬂoating-point format. It has
been shown that this new measure yields a more
accurate estimate for the FWL closed-loop stability.
Based on this FWL closed-loop stability measure, the
optimal controller realization problem has been formu-
lated, which can then be solved using numerical optimi-
zation algorithms. Two numerical examples have
demonstrated that the proposed design procedure yields
computationally eﬃcient controller realizations suit-
able for FWL ﬂoating-point implementation in real-
time applications. The idea of considering both the
dynamic range and precision of FWL ﬂoating-point
arithmetic is generic and can be used to deal with
the similar problems in FWL ﬁxed-point arithmetic
and FWL block-ﬂoating-point arithmetic. In fact, the
implementation of a digital controller should include
not only the selection of realizations but also the choice
of number representation formats. Further research
is currently being conducted to develop the design
procedure for choosing an optimal controller realization
as well as an appropriate representation scheme for a
given control law to achieve the best performance and
computational eﬃciency.
Acknowledgements
J. Wu and S. Chen wish to acknowledge the support
of the United Kingdom Royal Society under a KC
Wong fellowship (RL/ART/CN/XFI/KCW/11949). J.
Wu wishes to acknowledge the support of the
National Natural Science Foundation of China under
Grants 60174026 and 60374002.
References
Bauer, P. H., 1995, Absolute response error bounds for
ﬂoating point digital ﬁlters in state space representation.
IEEE Transactions on Circuits Systems II, 42, 610–613.
Bauer, P. H., and Wang, J., 1993, Limit cycle bounds for
ﬂoating point implementations of second-order recursive
digital ﬁlters. IEEE Transactions on Circuits Systems II,
40, 493–501.
Beveridge, G. S. G., and Schechter, R. S., 1970, Optimiza-
tion: Theory and Practice (McGraw-Hill).
Chen, S., and Luk, B. L., 1999, Adaptive simulated annealing
for optimization in signal processing applications. Signal
Processing, 79, 117–128.
Chen, S., Wu, J., Istepanian, R. S. H., and Chu, J., 1999,
Optimizing stability bounds of ﬁnite-precision PID control-
ler structures. IEEE Transactions on Automatic Control, 44,
2149–2153.
Fialho, I. J., and Georgiou, T .T., 1994, On stability and
performance of sampled-data systems subject to wordlength
constraint. IEEE Transactions on Automatic Control, 39,
2476–2481.
yThere appeared  i (the fractional wordlength storing
state variable) in the original problem of Liu et al. (1992).
We omit  i here as it has no relevance to our discussion.
Floating-point implemented digital controllers 439Fialho, I. J., and Georgiou, T. T., 2001, Computational
algorithms for sparse optimal digital controller realizations.
In R. S. H. Istepanian and J. F. Whidborne (Eds) Digital
Controller Implementation and Fragility: A Modern
Perspective (London: Springer Verlag), pp. 105–121.
Gevers, M., and Li, G., 1993, Parameterizations in Control,
Estimation and Filtering Problems: Accuracy Aspects
(London: Springer Verlag).
Istepanian, R. S. H., and Whidborne, J. F. (Eds), 2001,
Digital Controller Implementation and Fragility: A Modern
Perspective (London: Springer Verlag).
Istepanian, R. S. H., Whidborne, J. F., and Bauer, P., 2000,
Stability analysis of block ﬂoating point digital controllers.
In Proceedings of the UKACC International Conference on
Control 2000, Cambridge, UK, CD-ROM, 6 pages.
Kallioja « rvi, K., and Astola, J., 1996, Roundoﬀ errors in
block-ﬂoating-point systems. IEEE Transactions on Signal
Processing, 44, 783–790.
Kaneko, T., 1973, Limit-cycle oscillations in ﬂoating digital
ﬁlters. IEEE Transactions on Audio Electroacoustics, 21,
100–106.
Keel, L. H., and Bhattacharryya, S. P., 1997, Robust,
fragile, or optimal? IEEE Transactions on Automatic
Control, 42, 1098–1105.
Laub, A. J., Heath, M. T., Paige, C. C., and Ward, R. C.,
1987, Computation of system balancing transformations
and other applications of simultaneous diagonalization
reduction algorithms. IEEE Transactions on Automatic
Control, 32, 115–122.
Li, G., 1998, On the structure of digital controllers with ﬁnite
word length consideration. IEEE Transactions on Automatic
Control, 43, 689–693.
Li, G., and Gevers, M., 1990, Optimal ﬁnite precision imple-
mentation of a state-estimate feedback controller. IEEE
Transactions on Circuits and Systems, CAS-38, 1487–1499.
Li, G., Wu, J., Chen, S., and Zhao, K. Y., 2002, Optimum
structures of digital controllers in sampled-data systems:
a roundoﬀ noise analysis. IEE Proceedings of the Control
Theory and Applications, 149, 247–255.
Liu, B., and Kaneko, T., 1969, Error analysis of digital ﬁlters
realized with ﬂoating point arithmetic. Proceedings of the
IEEE, 57, 1735–1747.
Liu, K., Skelton, R., and Grigoriadis, K., 1992, Optimal
controllers for ﬁnite wordlength implementation. IEEE
Transactions on Automatic Control, 37, 1294–1304.
Madievski, A. G., Anderson, B. D. O., and Gevers, M.,
1995, Optimum realizations of sampled-data controllers
for FWL sensitivity minimization. Automatica, 31, 367–379.
Man, K. F., Tang, K. S., and Kwong, S., 1998, Genetic
Algorithms: Concepts and Design (London: Springer-
Verlag).
Miller, R. K., Michel, A. N., and Farrell, J. A., 1989,
Quantizer eﬀects on steady-state error speciﬁcations of
digital feedback control systems. IEEE Transactions on
Automatic Control, 34, 651–654.
Miller, R. K., Mousa, M. S., and Michel, A. N., 1988,
Quantization and overﬂow eﬀects in digital implementations
of linear dynamic controllers. IEEE Transactions on
Automatic Control, 33, 698–704.
Molchanov, A. P., and Bauer, P. H., 1995, Robust stability
of digital feedback control systems with ﬂoating point
arithmetic. In Proceedings of the 34th IEEE International
Conference on Decision and Control, New Orleans, LA,
USA, pp. 4251–4258.
Moroney, P., Willsky, A. S., and Houpt, P. K., 1980, The
digital implementation of control compensators: the coeﬃ-
cient wordlength issue. IEEE Transactions on Automatic
Control, AC-25, 621–630.
Ralev, K. R., and Bauer, P. H., 1999, Realization of block
ﬂoating-point digital ﬁlters and application to block imple-
mentations. IEEE Transactions on Signal Processing, 47,
1076–1086.
Rao, B. D., 1996, Roundoﬀ noise in ﬂoating point digital
ﬁlters. Control and Dynamic Systems, 75, 79–103.
Rink, R. E., and Chong, H. Y., 1979, Performance of state
regulator systems with ﬂoating point computation. IEEE
Transactions on Automatic Control, 24, 411–421.
Whidborne, J. F., and Gu, D.-W., 2002, Optimal ﬁnite-
precision controller and ﬁlter implementations using
ﬂoating-point arithmetic. In Proceedings of the 15th IFAC
World Congress, Barcelona, Spain, CD-ROM, Paper 990.
Whidborne, J. F., Istepanian, R. S. H., and Wu, J., 2001,
Reduction of controller fragility by pole sensitivity minimi-
zation. IEEE Transactions on Automatic Control, 46,
320–325.
Whidborne, J. F., Wu, J., and Istepanian, R. S. H., 2000,
Finite word length stability issues in an l1 framework.
International Journal of Control, 73, 166–176.
Williamson, D., and Kadiman, K., 1989, Optimal ﬁnite
wordlength linear quadratic regulation. IEEE Transactions
on Automatic Control, 34, 1218–1228.
Wu, J., Chen, S., Li, G., and Chu, J., 2001b, Finite word
length implementation for digital reduced order observer
based controllers. Developments in Chemical Engineering
and Mineral Processing, 9, 41–48.
Wu, J., Chen, S., Li, G., Istepanian, R. S. H., and Chu, J.,
2000, Shift and delta operator realizations for digital
controllers with ﬁnite-word-length considerations. IEE
Proceedings of Control Theory and Applications, 147,
664–672.
Wu, J., Chen, S., Li, G., Istepanian, R. S. H., and Chu, J.,
2001a, An improved closed-loop stability related measure
for ﬁnite-precision digital controller realizations. IEEE
Transactions on Automatic Control, 46, 1162–1166.
Yang, G.-H., Wang, J. L., and Lin, C., 2000, H1 control for
linear systems with additive controller gain variations.
International Journal of Control, 73, 1500–1506.
440 J. Wu et al.