The closed-loop stability issue of finite-precision realizations is investigated for digital controllers implemented in three different arithmetic formats, namely fixed-point, floating-point and block-floating-point schemes. It is shown that the controller coefficient perturbations resulting from using different finite word length (FWL) representation schemes possess quite different properties. A unified FWL closed-loop stability measure is derived which is applicable to all the three arithmetic schemes. Unlike the existing works which only take into account the precision of a representation scheme with an assumption of an unlimited dynamic range, both the dynamic range and precision of an arithmetic scheme are considered in this new unified measure. To facilitate the design of optimal finite-precision controller realizations, a computationally tractable FWL closed-loop stability measure is then introduced and the method of computing the value of this measure for a given controller realization is given. For each arithmetic scheme, the optimal controller realization is defined as the solution that maximizes the corresponding measure, and a numerical optimization approach is adopted to solve for the resulting optimal realization problem.
Index Terms -digital controller, finite word length, fixed-point, floating-point, block-floating-point, closed-loop stability, optimization.
Introduction
There has been a growing awareness that the effect of finite-precision implementation can have a serious influence on the actual performance of a designed controller. Modern digital processors have high precision, e.g. 16-bit fixed-point processor or even 32-bit floating-point processor. At the first thought, it may seem that the perturbations resulting from finite-precision computing of the digital controller are so small, compared to the uncertainty within the plant model, such that this controller "uncertainty" can simply be ignored. However, it has increasingly been realized that this is not necessarily the case. Due to the FWL effect, a casual controller implementation may degrade the designed closed-loop performance or even destabilize the designed stable closed-loop system, if the controller implementation structure is not carefully chosen. The effects of finite-precision computation have become more critical with the growing popularity of robust controller design methods which focus only on dealing with large plant uncertainty and result in controllers of much higher order and complexity than traditional classical control, as is highlighted in the so-called fragility puzzles [1] , [2] .
It is well known that a control law can be accomplished with different realizations and that the parameters of a controller realization are represented by a digital processor of finite bit length in a particular format, such as fixed-point, floating-point or block-float-point format. In other words, the implementation of a digital controller includes the choices of controller realization and number representation format. The dynamic range of number representation is defined by the integer or exponent part of the fixed-point or floating-point format, respectively, while the accuracy or precision is determined by the fractional or mantissa part, respectively. If the bit length can be infinite, that is, the dynamic range is unlimited and the precision is infinite, all the different realizations of a controller implemented in any one of the three representation schemes will be exactly equivalent. However, different controller realizations have different degrees of "robustness" to FWL errors. This property can be utilized to select "optimal" realizations that optimize some given criteria, and several works [3] - [5] have studied many aspects of FWL digital controller design, particularly the critical issue of FWL closed-loop stability robustness.
The fixed-point arithmetic has advantages of computational simplicity and efficiency over the floatingpoint arithmetic, but has much limited dynamic range and precision given the same bit length. It is therefore not surprising that most of the previous works have focused on finding optimal controller realizations using fixed-point arithmetic by maximizing some closed-loop stability measures [4] - [14] .
With decreasing in price and increasing in availability, the use of floating-point processors in controller implementations has increased dramatically. Few studies so far have addressed digital controller implementation in floating-point format with FWL considerations. An exception is the recent work [15] which has studied explicitly the closed-loop stability issue of floating-point digital controller realizations. The block-floating-point format has well been studied in the content of digital filter designs [16] , [17] . The work [18] has studied finite-precision controller realizations in a block-floating-point format. However, the approach adopted in [18] is to convert the problem into a fixed-point one, and the true block-floatingpoint FWL closed-loop stability measure has not been seen to date.
In all the previous works dealing with FWL digital controller realizations [4] - [15] , [18] , various FWL closed-loop stability measures are maximized to produce "optimal" realizations. However, there is a "flaw" here. These measures are only linked to the bits required to implement the fractional part of fixed-point representation or the mantissa part of floating-point representation, depending on which format is used. Maximizing these measures, while minimizing the bits required for the fractional or mantissa part, may actually increase the bits required for the integer or exponent part. Thus, the resulting "optimal" controller realizations are not necessarily true optimal in terms of the robustness to FWL effects. It is a common sense that a desired controller realization should be balanced, that is, has a small dynamic range. This is simply because in a digital processor the total bit length has to accommodate the dynamic range first to avoid overflow and/or underflow, and the remaining bits left are then used to implement the fractional or mantissa part of a representation scheme. Therefore, given a fixed bit length, a smaller dynamic range means a higher accuracy or precision, and an optimal controller realization should minimize the total bit length required.
The contribution of this paper is triple. Firstly, a unified FWL closed-loop stability measure is derived which is applicable to all the three arithmetic schemes. Secondly and most importantly, this new unified measure accommodates both the dynamic range and precision requirements and is directly linked to the total bit length of a representation format. Thirdly, based on the unified measure, it is convenient to compare optimal controller realizations in different representation schemes and to provide useful information in selecting a suitable controller realization and arithmetic format. The remainder of this paper is organized as follows. Section 2 briefly summarizes the three number representation schemes and highlights the different properties of perturbations resulting from different FWL representation schemes. Section 3 analyses the FWL effect of each arithmetic scheme on closed-loop stability and provides a unified measure for FWL implemented digital controllers. Section 4 defines a computationally tractable FWL closed-loop stability measure for controller realizations implemented in any presentation scheme and provides the method of computing its value. In section 5, the optimal FWL controller realization problem is formulated based on the unified measure, and a numerical optimization technique is adopted to solve for the resulting optimization problem. Two examples are given in section 6 to demonstrate the effectiveness of the proposed design method and to compare different optimal realizations based on different arithmetic schemes. The paper concludes at section 7.
Representation Schemes
There are three representation schemes available for digital processors to store numerical data in memory and in registers. These are fixed-point, floating-point and block-floating-point representations. All digital processors have finite word lengths, and the precise value of a number represented in one scheme can be different from that in another scheme. Furthermore, the perturbations resulting from different FWL representation schemes have very different properties.
Fixed-Point Representation
The fixed-point format for representing Ü ¾ Ê with a bit length ¬ ½ · ¬ · ¬ (1) assigns ½ bit for the sign, ¬ bits for the integer part and ¬ bits for the fraction part. Denote the sign of
Then, with the two's complement system, the set of all the possible fixed-point numbers that can be presented by the bit length ¬ is given by
When no overflow occurs, that is, Ü ¾ ¬ , the fixed-point quantization operator É ½ Ê ½ is defined as
where ¡ denotes the floor function, i.e., Ü is the closest integer less than or equal to Ü ¾ Ê. The quantization error of fixed-point representation is given by
Consequently, the quantization error is bounded by
Thus, when Ü is implemented in the fixed-point scheme of ¬ fraction bits, assuming no overflow, it is perturbed to
Obviously, the perturbation resulting from FWL fixed-point representation is additive.
Floating-Point Representation
It is well known that any Ü ¾ Ê can be expressed uniquely as
where × ¾ ¼ ½ is the sign of Ü, Û ¾ ¼ ½µ is the mantissa of Ü, ¾ is the exponent of Ü, and denotes the set of integers. When Ü is stored in a digital computer of finite ¬ bits in a floating-point format, the bits consists of three parts: one bit for ×, ¬ Û bits for Û and ¬ bits for . Obviously
Thus the set of all the floating-point numbers that can be represented by the bit length ¬ is given by
where and represent the lower and upper limits of the exponent, respectively, and
Note that unlike the fixed-point representation, underflow can occur in the floating-point representation.
Denote the set of integers as ℄ . When no underflow or overflow occurs, that is, the exponent of Ü is within ℄ , the floating-point quantization operator É ¾ Ê ¾ is defined by
where the exponent 
It can be shown easily that the quantization error is bounded by
Thus, when Ü is implemented in the floating-point format of ¬ Û mantissa bits, assuming no underflow or overflow, it is perturbed to
It can be seen that the perturbation resulting from FWL floating-point representation is multiplicative, unlike the perturbation resulting from FWL fixed-point representation, which is additive.
Block-Floating-Point Representation
The fixed-point and floating-point formats are the two basic representation schemes for real numbers. For a group of real numbers stored simultaneously in a digital processor, the so-called block-floating-point format is also available. Suppose that the group of real numbers form a set Ë. In the block-floatingpoint format, Ë is divided into some blocks. The block-floating-point scheme may be viewed as aiming to achieve a trade-off between the simplicity of fixed-point scheme and the accuracy of floating-point scheme. The difficulty is how to divide Ë into suitable blocks. If such a division is done inappropriately, the result obtained can be worse than the fixed-point scheme.
For an illustrative purpose and without the loss of generality, consider the case of dividing Ë into two non-empty subsets Ë ½ and Ë ¾ , which satisfy Ë ½ Ë ¾ Ë and Ë ½ Ë ¾ is the empty set. Let ½ ¾ Ë ½ be the element in Ë ½ that has the largest absolute value, and ¾ ¾ Ë ¾ be the element in Ë ¾ that has the largest absolute value. Then, any Ü ¾ Ë can be expressed uniquely as
where × ¾ ¼ ½ is the sign of Ü, Ù ¾ ¼ ½µ is the block mantissa of Ü, and the block exponent of Ü is
Obviously, all the elements in the same block have the same exponent value of . When all the elements in Ë are stored in a digital processor of the bit length
in a block-floating-point scheme, the bits are assigned as follows: ½ bit for the sign, ¬ Ù bits for Ù which is represented in fixed-point with the two's complement system, and ¬ bits for . Thus the set of all the block-floating-point numbers that can be represented by the bit length ¬ is given by
where and represent the lower and upper limits of the block exponent, respectively, and ¾ ¬ ½. Similar to the floating-point representation, overflow or underflow can occur in the blockfloating-point representation.
When no underflow or overflow occurs, that is, is within ℄ , the block-floating-point quantiza-
The quantization error of the block-floating-point representation is defined as
Thus, when Ü ¾ Ë is implemented in the block-floating-point format of ¬ Ù block mantissa bits, assuming no underflow or overflow, it is perturbed to
It can be seen that the perturbation resulting from FWL block-floating-point representation is neither multiplicative nor additive. The perturbation depends on the set Ë and how Ë is divided into blocks.
It is easily seen that in each representation format the total bit length always consists of three parts.
Sign occupies one bit. The dynamic range of representation is defined by ¬ , ¬ or ¬ bits, and the precision of representation is determined by ¬ , ¬ Û or ¬ Ù bits, depending on which scheme is actually chosen. For the notational conciseness, we introduce the "generalized" dynamic range word length ¬ Ö and precision word length ¬ Ô for the three representation schemes. It is understood that ¬ Ö ¬ , ¬ or ¬ and ¬ Ô ¬ , ¬ Û or ¬ Ù , depending on which format is actually used.
Problem Statement
Consider the generic discrete-time closed-loop control system depicted in Figure 1 , where the linear time-invariant plant È is described bý
which is completely state controllable and observable with ¾ Ê Ò¢Ò , ¾ Ê Ò¢Ô and ¾ Ê Õ¢Ò ; and the digital controller is described bý
with ¾ Ê Ñ¢Ñ , ¾ Ê Ñ¢Õ , Â ¾ Ê Ô¢Ñ , Å ¾ Ê Ô¢Õ and À ¾ Ê Ñ¢Ô . The output-feedback and observer-based controllers can be unified in this general structure: is an output-feedback controller when À ¼; a full-order observer-based controller when , Å ¼ and À ; a reduced-order observer-based controller, otherwise [19] , [20] .
It is well-known that the realizations of are not unique. All the realizations of form the realization set
where Ì ¾ Ê Ñ¢Ñ is any real-valued nonsingular matrix, called a similarity transformation. Let Û Î ´ µ, where Î ´¡µ denotes the column stacking operator. The vectors
where AE ´Ñ · Ôµ´Ñ · Õµ · ÑÔ. We also refer to Û as a realization of . The stability of the closed-loop system in Figure 1 depends on the eigenvalues of the matrix
where ¼ and Á denote the zero and identity matrices of appropriate dimensions, respectively. All the different realizations Û have the same set of closed-loop poles if they are implemented with infinite precision. Since the closed-loop system is designed to be stable, the eigenvalues
and the index « of representation formats « ½ fixed-point format is adopted ¾ floating-point format is adopted ¿ block-floating-point format is adopted (32)
Obviously, the format « indicates the actual representation format used. The controller realization Û is implemented in format « of ¬ Ö dynamic range bits, ¬ Ô precision bits and one sign bit. In the remainder of this paper, it is assumed that if Û is stored in the block-floating-point format, it is divided into "natural" blocks of Û , Û , Û Â , Û Å and Û À . Let ¾ Û be the element in which has the largest absolute value. The elements , Â , Å and À are similarly defined. Denote
with Ì being the transpose operator.
Firstly, the dynamic range of ¬ Ö bits must be large enough for Û. We define a dynamic range measure for realization Û with format « as ´Û «µ
The rationale of this dynamic range measure becomes clear in the following (obvious) propositions. Let ¬ Ñ Ò Ö be the smallest dynamic-range bit length that, when used to implement Û, does not cause overflow or underflow. The minimum required dynamic-range bit length can easily be computed by
where ¡ denotes the ceil function, i.e., Ü is the closest integer greater than or equal to Ü ¾ Ê. 
When the dynamic range of representation format « is sufficient, according to the results of section 2, Û is perturbed to Û · Ö´Û «µ AE ¡ due to the effect of finite ¬ Ô where 
A Tractable FWL Closed-Loop Stability Measure
When the FWL error ¡ is small, from a first-order approximation, 
The assumption of small ¡ is usually valid in practical implementation of digital controllers. Generally speaking, there is no rigorous relationship between ¼´Û «µ and ½´Û «µ, but ½´Û «µ is connected with a lower bound of ¼´Û «µ in some manners: there are "stable perturbation cubes" larger than ¡ ¡ Ñ Ü ½´Û «µ while there is no "stable perturbation cube" larger than ¡ ¡ Ñ Ü ¼´Û «µ [11] , [13] . Hence, in most cases, it is reasonable to take that ½´Û «µ ¼´Û 
Optimization Procedure
In a given format «, different realizations Û yield different values of ½´Û «µ. It is of practical importance to find an "optimal" realization Û ÓÔØ´« µ that maximizes ½´Û «µ for the format «. The controller implemented with this optimal realization Û ÓÔØ´« µ in format « needs a minimum bit length and has a maximum tolerance to the FWL error. This optimal realization problem is formally defined as
As a close-form or global analytical solution of this optimal realization problem is not available, a numerical optimization approach is adopted to solve for this optimization problem. 
where
Considering that Û is a function of Ì, Ö´Û «µ and ´Û «µ therefore depend on Ì and «, and we define the following optimization criterion in format «:
The optimal realization problem (62) can then be posed as the following optimization problem:
Efficient numerical optimization methods exist for solving for this optimization problem to provide an optimal similarity transformation Ì ÓÔØ´« µ. With Ì ÓÔØ´« µ, the optimal realization Û ÓÔØ´« µ in format « can readily be computed. By setting « ½, ¾ and ¿, respectively, in the optimization problem (74), we can attain the optimal fixed-point realization Û ÓÔØ´½ µ, the optimal floating-point realization Û ÓÔØ´¾ µ and the optimal block-floating-point realization Û ÓÔØ´¿ µ for a digital controller.
It is worth to re-iterate that the optimization problem (74) yields the true optimal controller realization, as the solution Ì ÓÔØ´« µ minimizes the required ¬ Ô as well as ¬ Ö and therefore minimizes the required total bit length ¬. This should be compared with the existing "optimal" realization problems [4] - [15] , [18] , which only try to minimize the required precision bit ¬ Ô and, as a consequence, do not necessarily minimize the required total bit length ¬. As the optimization problem (74) is highly nonlinear, global optimization algorithms, such as the genetic algorithm [21] , [22] and adaptive simulated annealing [23] , [24] , can be adopted. Global optimization methods are however computationally more demanding. Local optimization algorithms, such as Rosenbrock and Simplex algorithms [25] , [26] , are computationally simpler but run more risks of only attaining a local solution. Our experience with the optimization problem (74) suggests that using a local search algorithm is often adequate. Unlike optimizing the precision measure ½´Û ½µ alone [13] , the dynamic range measure ´Û «µ in the criterion ½´Û «µ helps to bound the solution set and the cost function (73) appears to behave better.
Design Examples and Result Comparison
Two examples were used to illustrate the design procedure based on the unified FWL closed-loop stability measure and to compare the resulting optimal realizations in different representation schemes.
Example 1.
This example was taken from [13] . The closed-loop system contained a reduced-order observer-based controller. The discrete-time plant was given by
The initial realization of the digital controller was given by
Based on the proposed unified FWL closed-loop stability measure, the optimization problem (74) was formed. Using the MATLAB routine fminsearch.m, this optimization problem was solved for « ½, ¾ and ¿, respectively, to obtain the optimal similarity transformation in fixed-point format
the optimal similarity transformation in floating-point format
and the optimal similarity transformation in block-floating-point format
Example 2. In this example, the discrete-time plant taken from [4] was given by
The initial realization of the digital controller, which was a modification of the initial output-feedback controller in [4] by a similarity transformation, was given by
Using the same method for Example 1, the three optimal similarity transformations obtained for « ½, ¾ and ¿ were, respectively, Table 1 lists the values of the measures ½ , ½ and in the three different representation schemes together with the corresponding estimated bit lengths for the initial realization Û ¼ , the optimal fixed-point realization Û ÓÔØ´½ µ, the optimal floating-point realization Û ÓÔØ´¾ µ and the optimal block-floating-point realization Û ÓÔØ´¿ µ of Example 1. Table 2 does the same thing for Example 2. In these two tables, the various estimated bit lengths were computed from their respective measure values according to (36), (52) and (61). Some observations can readily be made from the results in Tables 1 and 2 .
As far as the robustness of FWL closed-loop stability is concerned, given an arbitrary realization, floating-point representation is not necessarily better than fixed-point or block-floating-point one. For example, floating-point is the best format to implement the initial realization Û ¼ of Example 1 while fixed-point is the best format to implement Û ¼ of Example 2. In fact, for Example 2, we had deliberately chosen Û ¼ as the transformation of the initial controller realization in [4] by a similarity transformation matrix to favor a fixed-point implementation. However, as expected, the optimal floating-point realization Û ÓÔØ´¾ µ implemented in floating-point format is always the best in terms of robustness to FWL errors.
Also the results in Table 1 show that fixed-point format is better than block-floating-point format to implement Û ÓÔØ´« µ of Example 1 for ½ « ¿, while the results of Table 2 indicate that the opposite is true for Example 2. This simply confirms the fact that the performance of block-floating-point scheme critically depends on how to divide Û into blocks. With a proper division, block-floating-point scheme should beat fixed-point scheme in terms of robustness to FWL errors. The results also show that the proposed optimization procedure is very effective. This can be seen by comparing the values of the FWL closed-loop stability measure for Û ¼ and Û ÓÔØ´« µ implemented in a same format «.
It is obvious that the true minimum dynamic-range bit length ¬ Ñ Ò Ö´Û «µ for a realization Û in format « can directly be obtained by examining the elements of Û. The true minimum precision bit length ¬ Ñ Ò Ô´Û «µ however can only be obtained through simulation. That is, starting from a very large ¬ Ô , reduce ¬ Ô by one bit and check the closed-loop stability. The process is repeated until there appears closed-loop instability at ¬ Ô ¬ ÔÙ . Then ¬ Ñ Ò Ô ¬ ÔÙ · ½. Table 3 Notice that any realization Û ¾ Ë implemented in infinite precision (unlimited ¬ Ö and infinite ¬ Ô ) will achieve the exact performance of the infinite-precision implemented Û ¼ , which is the designed controller performance. For this reason, the infinite-precision implemented Û ¼ is referred to as the ideal controller realization Û Ð . Figure 2 compares the unit impulse response of the first plant output Ý ½´ µ of Example 1 for the ideal controller Û Ð with those of the 22-bit fixed-point implemented Û ÓÔØ´½ µ (3 integer bits and 18 fractional bits), the 22-bit floating-point implemented Û ÓÔØ´¾ µ (5 exponent bits and 16 mantissa bits) and the 22-bit block-floating-point implemented Û ÓÔØ´¿ µ (2 block exponent bits and 19 block mantissa bits). 
Conclusions
We have presented a unified closed-loop stability measure for finite-precision digital controller realizations implemented in different representation schemes. The proposed computationally tractable measure takes into account both the dynamic range and precision of arithmetic schemes, and therefore provides a true measure of FWL characteristics of controller realizations. Based on this unified FWL closedloop stability measure, the optimal controller realization problems in different representation formats have been formulated, which can easily be solved for using standard numerical optimization algorithms.
Simulation results have confirmed that the optimal floating-point controller realization implemented in floating-point format is the best in terms of robustness to FWL errors. The results have also shown that with a proper division of controller coefficients into blocks, block-floating-point implemented realizations can have better robustness to FWL errors than fixed-point implemented ones but choosing an appropriate division of blocks is difficult. These results agrees with the common understanding of the number representation formats. It is well known that fixed-point format is the best in terms of hardware cost, arithmetic operation simplicity and execution speed while floating-point format is the worst. The proposed design procedure provides designer with useful quantitative information regarding finite precision computational properties, namely robustness to FWL errors and estimated minimum bit length for guaranteeing closed-loop stability. This allows designer to choose an optimal controller realization in an appropriate representation scheme to achieve best computational efficiency and closed-loop controller performance. 
