Formalization of Fixed-Point Arithmetic in HOL by Akbarpour, Behzad et al.
Formalization of Fixed-Point Arithmetic in
HOL
Behzad Akbarpour Soene Tahar
Abdelkader Dekdouk
Dept. of Electrical and Computer Engineering, Concordia University




This paper addresses the formalization in higher-order logic of
xed-point arithmetic. We encoded the xed-point number system
and specied the dierent quantization modes in xed-point arith-
metic such as the directed and even quantization modes. We also
considered the formalization of exceptions detection and their han-
dling like overow and invalid operation. An error analysis is then
performed to check the correctness of the quantized result after car-
rying out basic arithmetic operations, such as addition, subtraction,
multiplication and division against their mathematical counterparts.
Finally, we showed by an example how this formalization can be used
to enable the verication of the transition from oating-point to xed-
point algorithmic level in the signal processing design ow.
Keywords: Fixed-Point Arithmetic, Floating-Point Arithmetic, Theorem-
Proving, HOL
1 Introduction
Modern signal processing chips, such as integrated cable modems and wireless
multimedia terminals, are described with algorithms in oating-point preci-
1
1 INTRODUCTION 2
sion. Often, the architectural style with which these algorithms are imple-
mented is precision-limited, and relies on a xed-point representation. This
requires a translation of the specication from oating-point to xed-point
precision. This implementation is optimized following some application spe-
cic trade-os such as speed, cost, area and power consumption of the chip.
The optimization task is tedious and error prone due to the eects of quanti-
zation noise introduced by the limited precision of xed-point representation.
An overview of a conventional digital signal processing (DSP) design ow is















































Figure 1: DSP Design Flow
Usually the conformance of the xed-point implementation with respect
to the oating-point specication is veried by simulation techniques which
cannot cover the entire input space yielded by the oating-point represen-
tation. The objective of this work is to formalize the xed-point arithmetic
in higher-order logic as a basis for checking the correctness of the imple-
mentation of DSP designs against higher level algorithmic descriptions in
oating-point and xed-point representations.
Unlike oating-point arithmetic which is standardized in IEEE-754 [18]
and IEEE-854 [19], current xed-point arithmetic does not follow any par-
ticular standard and depends on the tool and the language used to design
1 INTRODUCTION 3
the DSP chip. Examples of such tools are SPW (Cadence) [7], Matlab-
Simulink (Mathworks) [25], CoCentric (Synopsys) [37], and DSP Station
(Mentor Graphics) [27]. For instance, in SPW (Signal Processing Worksys-
tem), a xed-point number is dened as a binary string and a set of at-
tributes. Attributes specify how the binary string is interpreted using three
arguments for the total number of bits, the number of integer bits, and the
sign format. For arithmetic operations, it supports three kinds of exceptions
such as loss-of-sign or overow, two overow modes, and ve quantization
modes. In Matlab Simulink Fixed-Point Blockset [26], xed-point numbers
are stored in data types that are characterized by their word size (up to
128 bits), a radix point, and whether they are signed or unsigned. The radix
point is used to support integers, fractionals, and generalized xed-point data
types. The Matlab Blockset provides four quantization modes correspond-
ing to those supported by SPW. It also supports saturation and wrapping
to deal with overow for all xed-point data types. Another example is the
Synopsys CoCentric tool, which uses xed-point as described in the Sys-
temC language [33]. It supports signed and unsigned xed-point data types,
as well as limited precision (53 bits mantissa) xed-point, called fast xed-
point to speed up simulation. SystemC supports seven quantization modes,
of which four correspond exactly to the quantization modes of SPW. The
other three modes are specic to SystemC and are not supported by the
other tools. SystemC supports ve overow modes covering those of SPW.
With the objective of providing a general methodology for the formalization
and verication of xed-point arithmetic using higher-order logic, we dene
in this paper a complete common set of xed-point arithmetic as supported
by most of the DSP tools, in particular SPW and SystemC.
Based on higher-order logic, we propose to encode a xed-point number
by a pair composed of a Boolean word, and a triplet indicating the word
length, the length of the integer portion, and the sign format. Then, we for-
malize the concepts of valuation and quantization as functions that convert
respectively a xed-point number to a real number and vice versa, taking
into account dierent quantization and overow modes. Fixed-point arith-
metic operations are formalized as functions performing operations on the
real numbers corresponding to the xed-point operands and then applying
the quantization on the real number result. Finally, we prove various lemmas
regarding the error analysis of the xed-point quantization and correctness of
the basic operations like addition, multiplication, and division. The higher-
order logic formalization and proof were done using the HOL theorem prover
2 RELATED WORK 4
[12]. They were developed into a full xed-point arithmetic library, which
was recently included in the last release of HOL (HOL4, Kananaskis-2).
The rest of the paper is organized as follows: Section 2 gives a review
on work related to the formalization of oating-point arithmetic, some of
which directly inuenced our work. Section 3 describes the xed-point arith-
metic denitions adopted in this paper including the format of the xed-point
numbers, arithmetic operations, exceptions detection and their handling, and
the dierent overow and quantization modes. Section 4 describes in detail
their formalization in HOL. In Section 5, we discuss the verication of ba-
sic xed-point arithmetic operations, such as addition and multiplication.
Section 6 presents an illustrative example on how this formalization can be
used through the modeling and verication of an Integrator circuit. Finally,
Section 7 concludes the paper.
2 Related Work
There exist several related work in the open literature on the formalization
and verication of IEEE standard based oating-point arithmetic. For in-
stance, Barrett [2] specied parts of the IEEE-754 standard in Z, and Miner
[29] formalized the IEEE-854 oating-point standard in PVS. The latter de-
ned the relation between oating-point numbers and real numbers, round-
ing, and some arithmetic operations on both nite and innite operands.
He used this formalization to verify abstract mathematical descriptions of
the main operations and their relation to the corresponding oating-point
implementations. His work was one of the earliest on the formalization of
oating-point standards using theorem proving. His formal specication was
then used by Miner and Leathrum [30] to verify in PVS a general class of
IEEE compliant subtractive division algorithms.
Carreno [8] formalized the same IEEE-854 standard in HOL. He inter-
preted the lexical descriptions of the standard into mathematical conditional
descriptions and organized them in tables, which were then formalized in
HOL. He discussed dierent standard aspects such as precisions, exceptions
and traps, and many other arithmetic operations such as addition, multipli-
cation, and square-root of oating-point numbers.
Harrison [13] constructed the real numbers in HOL. He then developed
in HOL a generic oating-point library [14] to dene the most fundamental
terms of the IEEE-754 standard and to prove the corresponding correctness
2 RELATED WORK 5
analysis lemmas. He used this library to formalize and verify oating-point
algorithms of complex arithmetic operations such as the square root, the
exponential function [15], and the transcendental functions [16] against their
abstract mathematical counterparts. He also used the oating-point library
for the verication of the class of division algorithms used in the Intel IA-64
architecture [17].
Moore et al. [31] have veried the AMD-K5 oating-point division algo-
rithm using the ACL2 theorem prover. Also, Russino [35] has developed
a oating-point library for the ACL2 prover and applied it successfully to
verify the oating-point multiplication, division, and square root algorithms
of the AMD-K5 and AMD Athlon processors.
Aagaard and Seger [1] combined BDD-based model-checking and theo-
rem proving techniques in the Voss hardware verication system to verify the
IEEE compliance of the gate-level implementation of a oating-point mul-




Pro processor's oating-point execution unit at the gate
level using a combination of model-checking and theorem proving. Leeser et
al. [24] veried a subtractive radix-2 square root algorithm and its hardware
implementation using the higher-order logic theorem proving system Nuprl.
Chen and Bryant [10] used word-level SMV to verify a oating-point adder.
Cornea-Hasegan [9] used iterative approaches and mathematical proofs to
verify the correctness of the IEEE oating-point square root, divide, and
remainder algorithms.
More recently, Daumas et al. [11] have presented a generic library for
reasoning about oating-point numbers within the Coq system. This library
was then used in the verication of IEEE-compliant oating-point arithmetic
algorithms [5] and hardware units [6]. Berg et al. [3] have formally veried
a theory of IEEE rounding presented in [32] using the theorem prover PVS.
They have used a formal denition of rounding based on Miner's formaliza-
tion of the standard [29]. This theory was then used to prove the correctness
of a fully IEEE compliant oating-point unit used in the VAMP processor [4].
Sawada and Gamboa [36] formally veried the correctness of a oating-point
square root algorithm used in the IBM Power4
TM
processor. The verication
was carried out with the ACL2(r) theorem prover which is an extension of the
ACL2 theorem prover that performs reasoning on real numbers using non-
standard analysis. The proof required the analysis of the approximation error
on Chebyshev series by proving Taylor's theorem. Kaivola et al. [20, 21, 22]
presented the formal verication of the oating-point multiplication, divi-
3 FIXED-POINT ARITHMETIC 6




The verication was carried out using the Forte verication framework, a
combined model-checking and theorem-proving system built on top of the
Voss system. Model checking was done via symbolic trajectory evaluation
(STE), and theorem proving was done in the ThmTac proof tool.
While all of the above work are concerned with oating-point representa-
tion and arithmetic, there is no report in the open literature on any machine-
checked formalization of properties of xed-point arithmetic. Therefore, the
formalization presented in this paper is to our best knowledge, the rst of
its kind. Our formalization of the xed-point arithmetic has been inspired
mostly by the work done by Harrison [15] and Carreno [8] on oating-point.
Harrison's work was more oriented towards verication purposes. Indeed,
we used an analogous set of lemmas to his work, to check the validity of
operation results and to carry out the error analysis of the quantized xed-
point result. For exception handling which is not covered by Harrison [15],
we followed Carreno [8] who formalized oating-point exceptions and their
handling in more details.
3 Fixed-Point Arithmetic
In this section we describe the xed-point arithmetic denitions on which we
base our formalization. While we tried to keep these denitions as general
as possible, the xed-point numbers format, arithmetic operations, overow
and quantization modes, and exception handling adopted are to some extent
inuenced by the xed-point arithmetic dened by Cadence SPW [7] and
Synopsys SystemC [33].
3.1 Fixed-Point Numbers
A xed-point number has a xed number of binary digits and a xed posi-
tion for the decimal point with respect to that sequence of digits. Fixed-point
numbers can be either unsigned (always positive) or signed (in two's comple-
ment representation). For example, consider the case of four bits being used
to represent the xed-point numbers. If the numbers are unsigned and if
the decimal point or, more properly, the binary point is xed at the position
after the second digit (XX.XX), the representable real values range from 0:0
to 3:75. In two's complement format, the most signicant bit is the sign
3 FIXED-POINT ARITHMETIC 7
bit. The remaining bits specify the magnitude. If four bits represent the
xed-point numbers, and the binary point is xed at the position after the
second digit following the sign bit (SXX.X), the real values range from  4:0
to +3:5.
Fixed-point numbers are expressed as a pair consisting of a binary string
and a set of attributes, (Binary String ;Attributes). The attributes specify
how the binary string is interpreted. Generally, the attributes are specied
in the following format:
(wl; iwl; sign) (1)
which consists of the following parameters:
 wl: Total word length, specifying the total number of bits used to
represent the xed-point binary string, including integer bits, fractional
bits, and sign bit, if any. Word length must be in the range of 1 to 256.
 iwl: Integer word length, specifying the number of integer bits (the
number of bits to the left of the binary point, excluding the sign bit, if
any). If this number is negative, repeated leading sign bits or zeros are
added to generate the equivalent binary value. If this number is greater
than the total word length, trailing zeroes are added to generate the
equivalent binary value.
 sign: A letter specifying the sign format: \u" for unsigned, and \t"
for two's complement.
Example: According to the above denitions, the real value  0:75 is rep-
resented by (111101; (6; 3; t)). If we consider the same bit string with un-
signed attributes (111101; (6; 3; u)), then the equivalent number is 111:101
or +7:625. On the other hand, (111101; (6; 3; u)) represents the value
:000111101 which is +0:119140625.
3.2 Fixed-Point Operations
A DSP design tool usually provides a library including basic xed-point sig-
nal processing blocks such as adders, multipliers, delay blocks, and vector
blocks. It also supports xed-point hardware blocks such as multiplexers,
buers, inverters, ip-ops, bit manipulation and general-purpose combina-
tional logic blocks. These blocks accurately model the behavior of xed-point
3 FIXED-POINT ARITHMETIC 8
digital signal processing systems. In this paper, we will focus on the arith-
metic and logic operations, but the idea can be generalized to the remaining
operations. Operations performed on xed-point data types are done using
arbitrary and full precision. After the operation is complete, the resulting
operand is cast to t the xed-point data type object. The casting operation
applies the quantization behavior of the target object to the new value and
assigns the new value to the target object. Then, the appropriate overow
behavior is applied to the result of the process which gives the nal value.
In addition to the parameters corresponding to the input operands and out-
put result, the arithmetic operations take specic parameters dening the
overow and quantization (loss of precision) modes. These parameters are
as follows:
 q mode: Quantization mode. This parameter determines the behavior
of the xed-point operations when the result generates more precision
in the least signicant bits (LSB) than is available.
 o mode: Overow mode. This parameter determines the behavior of
the xed-point operations when the result generates more precision in
the most signicant bits (MSB) than is available.
 n bits: Number of saturated bits. This parameter is only used for
overow mode and species how many bits will be saturated if a satu-
ration behavior is specied and an overow occurs.
Example: Consider a block that serves as a primitive xed-point multiplier,
which truncates the results when loss of precision occurs and wraps the result
when overow occurs. We can make a call to the multiplier routine through
the function fxpMul (Wrap j Truncate; In1 ; In2 ;Out), in which In1 and In2
are the input xed-point operands, Out is a parameter corresponding to
the output attributes, and Wrap and Truncate indicate the overow and
quantization modes, respectively.
3.2.1 Fixed-Point Exception Handling
Fixed-point arithmetic operations that do not compute and return an exact
result resort to an exception-handling procedure. This procedure is controlled
by the exception ags. There are three kinds of exceptions that can be tested
[7]:
3 FIXED-POINT ARITHMETIC 9
 Loss of Sign: The result was negative but the result storage area was
unsigned. Zero is stored.
 Overow: The result was too big to be represented in the result stor-
age area. The overow mode determines the returned value.
 Invalid: No result can be meaningfully represented (e.g., divide by
zero). This error can also occur if the xed-point number itself is
invalid.
3.2.2 Fixed-Point Quantization Modes
Quantization eects are used to determine what happens to the LSBs of a
xed-point type when more bits of precision are required than are available.
The quantization modes are listed in Table 1.
Table 1. Fixed-Point Quantization Modes
Quantization Mode Name
Quantization to Plus Innity RND
Quantization to Zero RND ZERO
Quantization to Minus Innity RND MIN INF
Quantization to Innity RND INF
Convergent Quantization RND CONV
Truncation TRN
Truncation to Zero TRN ZERO
Figure 2 shows the behavior of each quantization mode. The X axis is
the result of the previous arithmetic operation and the Y axis is the value
after quantization. The diagonal line represents the ideal number represen-
tation given innite bits. The small horizontal lines show the eect of the
quantization. Any value of the X axis within the range of the line will be
converted to the value of the Y axis. The symbol q in the gure refers to
the quantization step, that is, the resolution of the data type. Each non
integer value on the X axis is located in a quantization interval surrounded
by two successive integer multiples of q as its closest representable quantized
numbers, one greater and one smaller than the original value. If the value is
exactly in the middle of the quantization interval, then the two closest rep-
resentable numbers are equally distanced apart from the original value. As
3 FIXED-POINT ARITHMETIC 10
shown in this gure modes RND, RND ZERO, RND MIN INF, RND INF,
and RND CONV will quantize a value to the closest representable number if
the two nearest representable numbers are not equally distanced apart from
the original value. Otherwise, quantization towards plus innity, to zero,
towards minus innity, towards plus innity if positive or minus innity if
negative, and towards nearest even will be performed, respectively (Figure
2 (a-e)). The TRN mode is the default for xed-point types and will be
used if no other value is specied. The result is always quantized towards
minus innity (Figure 2 (f)). In other words, the result value is the rst
representable number lower than the original value. Finally, for TRN ZERO








































Figure 2: The Behavior of Fixed-Point Quantization Modes
3.2.3 Fixed-Point Overow Modes
In addition to quantization modes, we can use overow modes to approximate
a higher range for xed-point operations. Usually, overow occurs when the
result of an operation is too large or too small for the available bit range.
Specic overow modes can then be implemented to reduce the loss of data.
Overow modes are specied by the o mode and n bits parameters, and are
listed in Table 2.
3 FIXED-POINT ARITHMETIC 11
Table 2. Fixed-Point Overow Modes
Overow Mode Name
Saturation SAT
Saturation to Zero SAT ZERO
Symmetrical Saturation SAT SYM
Wrap-Around WRAP
Sign Magnitude Wrap-Around WRAP SM
Figure 3 shows the behavior of each overow mode for a 3 bit xed-point
data type. The diagonal line represents the ideal value if innite bits are
available for representation. The dots represent the values of the result. The
X axis is the original value and the Y axis is the result. From this gure, it
can be seen that MAX = 3 and MIN =  4 for a 3 bit xed-point data type.
The SAT mode will convert the specied value to MAX for an overow or
MIN for an underow condition (Figure 3 (a)). The SAT ZERO mode will
set the result to 0 for any input value that is outside the representable range
of the xed-point type. If the result value is greater than MAX or smaller
than MIN, the result will be 0 (Figure 3 (b)). In the SAT SYM mode, posi-
tive overow will generate MAX and negative overow will generate  MAX
for signed numbers or MIN for unsigned numbers (Figure 3 (c)). With the
WRAP mode, the value of an arithmetic operand will wrap around from
MAX to MIN as MAX is reached. There are two dierent cases within this
mode. The rst is with the n bits parameter set to 0 or having a default
value of 0. All bits except for the deleted bits are copied to the result num-
ber (Figure 3 (d)). The second is when the n bits parameter is a nonzero
value. In this case the specied number of most signicant bits of the result
number are saturated with preservation of the original sign, the other bits
are simply copied. Positive numbers remain positive and negative numbers
remain negative. A graph showing this behavior with n bits = 1 is given in
Figure 3 (e). Note that positive numbers wrap around to 0 while negative
values wrap around to  1. The WRAP SM overow mode uses sign magni-
tude wrapping. This overow mode behaves in two dierent styles depending
on the value of the n bits parameter. When n bits is 0, no bits are saturated.
This mode will rst delete any MSB bits that are outside the result word
length. The sign bit of the result is set to the value of the least signicant
deleted bit. If the most signicant remaining bit is dierent from the original
MSB, then all the remaining bits are inverted. If the MSBs are the same,
4 FORMALIZING FIXED-POINT ARITHMETIC IN HOL 12
the other bits are copied from the original value to the result value. A graph
showing the result of this overow mode is provided in Figure 3 (f). As the
value of X increases, the value of Y increases toMAX and then slowly starts
to decrease until MIN is reached. The result is a sawtooth like waveform.
With n bits greater than 0, n bits MSB bits are saturated to 1. A graph
showing this behavior with n bits = 1 is given in Figure 3 (g). Note that
while the graph looks somewhat like a sawtooth waveform, positive numbers
do not dip below 0 and negative numbers do not cross  1 [33].
























1 2 3 4 5 6 X
a) SAT b) SAT_ZERO c) SAT_SYM
d) WRAP, n_bits = 0 e) WRAP, n_bits = 1
f) WRAP_SM, n_bits = 0 g) WRAP_SM, n_bits = 1
5


























1 2 3 4 5 6 7
Figure 3: The Behavior of Fixed-Point Overow Modes
4 Formalizing Fixed-Point Arithmetic in HOL
In this section, we present formalization of the xed-point arithmetic in
higher-order logic, based on the general purpose HOL theorem prover. The
4 FORMALIZING FIXED-POINT ARITHMETIC IN HOL 13
HOL system supports both forward and backward proofs. The forward proof
style applies inference rules to existing theorems to obtain new theorems and
eventually the desired theorem. Backward or goal oriented proofs start with
the goal to be proven. Tactics are applied to the goal and subgoals until the
goal is decomposed into simpler existing theorems or axioms. The system
basic language includes the natural numbers and Boolean type. It also in-
cludes other specic extensions like reals library [13], which was proved to
be essential for our xed-point arithmetic formalization. Table 3 summarizes
some of the HOL symbols used in this paper and their meanings [12].
Table 3. HOL Symbols
HOL Symbol Standard Symbol Meaning
@x: t "x: t An x such that t (x) holds
x: t x: t Function that maps x to t (x)
& (none) Natural map operator (N ! R)
: t : t Not t
: x   x Unary negation of x
inv (x) x
 1
Multiplicative inverse of x
abs (x) j x j Absolute value of x
x pow n x
n
Real x raised to natural number power n
m EXP n m
n
Natural number m raised to exponent n
The HOL type system does not support subtypes, so the real numbers
(R) have formally a dierent type from the natural numbers (N). Therefore,
the unary operator ampersand (&) is used to map between them. Thus the
real number numerals can be written as &0;&1, etc [15].
4.1 Fixed-Point Numbers Representation
The actual xed-point numbers are represented in HOL by a pair of elements
representing the binary string and the set of attributes. The extractors for
the two elds of a xed-point number are dened as follows:
`
def
string (s,a) = s
`
def
attrib (s,a) = a
The binary string is treated as a Boolean word (type: bool word). For
example, the bit string 1010 is represented by WORD [T;F;T;F]. In this way,
4 FORMALIZING FIXED-POINT ARITHMETIC IN HOL 14
we use the denitions and theorems already available in the HOL word library [39]
to facilitate the manipulation of binary words. The attributes are represented by
a triplet of natural numbers for the total number of bits, the integer bits and the
sign format.




wordlength (w,iw,s) = w
`
def
intbits (w,iw,s) = iw
`
def
sign (w,iw,s) = s




is_signed X = (sign X = 1)
`
def
is_unsigned X = (sign X = 0)
The number of digits on the right hand side of the binary point of a xed-point
number is dened as fracbits. It can be derived as the dierence between the total





if (is_unsigned X) then (wordlength X   intbits X)
else (wordlength X   intbits X   1)
Two useful derived predicates test the validity of a set of attributes and a xed-
point number based on the denition in Section 3.1. In a valid set of attributes,
the wordlength should be in the range of 1 and 256, the sign can be either 0 or
1, and the number of integer bits is less than or equal to the wordlength. A valid
xed-point number must have a valid set of attributes and the length of its binary




wordlength X > 0 ^ wordlength X < 257 ^




validAttr (attrib a) ^ (WORDLEN (string a) = wordlength (attrib a))
whereWORDLEN is a predened function of the HOL word library, which returns
the size of a word.
4 FORMALIZING FIXED-POINT ARITHMETIC IN HOL 15
4.2 Fixed-Point Type
Now we dene the actual HOL type for the xed-point numbers. The type is
dened to be in bijection with the appropriate subset of (bool word  N
3
), with
the bijections written in HOL as fxp : (bool word  N
3
)! fxp, and defxp : fxp!
(bool wordN
3
). The bijection maps the set of all elements of type (bool wordN
3
)
to the set of valid xed-point numbers specied by the function is valid as dened
in the previous section. For this purpose, we make use of built-in facilities in HOL
for dening new bijection types [38]. A similar technique was used in [15] for
dening type bijections for the oating-point numbers (oat,deoat) in HOL.
fxp_tybij =
` (8a. fxp (defxp a) = a) ^ (8r. is_valid r = (defxp (fxp r) = r))
We specialize the previous functions and predicates to the fxp type, as follows:
`
def
String a = string (defxp a)
`
def
Attrib a = attrib (defxp a)
`
def
Wordlength a = wordlength (Attrib a)
`
def
Intbits a = intbits (Attrib a)
`
def
Fracbits a = fracbits (Attrib a)
`
def
Sign a = sign (Attrib a)
`
def
Issigned a = is_signed (Attrib a)
`
def
Isunsigned a = is_unsigned (Attrib a)
`
def
Isvalid a = is_valid (defxp a)
Note that we start the name of the functions manipulating xed-point num-
bers by capital letters to distinguish them from those taking pairs and triplets as
argument.
4.3 Fixed-Point Valuation
Now we specify the real number valuation of xed-point numbers. We use two


































bit of the binary string in the xed-point number
1
,
and M and N are respectively fracbits and wordlength. In HOL, we dene the





if (Isunsigned a) then &(BNVAL (String a)) / 2 pow Fracbits a
else (&(BNVAL (String a))   &((2 EXP Wordlength a) *
BV (MSB (String a)))) / 2 pow Fracbits a
where BNVAL is a function which returns the numeric value of a Boolean word,
BV is a function for mapping between a single bit and a number, and MSB is a
constant for the most signicant bit of a word, available in the HOL word library.
We also dene the real value of the smallest (MIN ) and largest (MAX ) repre-
sentable numbers for a given set of attributes. The maximum is dened for both






where a is the intbits and b the fracbits. The minimum value for unsigned numbers
is zero and for signed numbers is computed using the following formula:
MIN =   2
a
(5)
Thereafter, we obtain the corresponding functions in HOL.
`
def
MAX X = 2 pow intbits X   inv (2 pow fracbits X)
`
def
MIN X = if (is_unsigned X) then 0 else :(2 pow intbits X)
The constants for the smallest (bottomfxp) and largest (topfxp) representable




if (is_unsigned X) then fxp (WORD (REPLICATE (wordlength X) T),X)




if (is_unsigned X) then fxp (WORD (REPLICATE (wordlength X) F),X)
else fxp (WCAT (WORD [T],WORD (REPLICATE (wordlength X   1) F)),X)
where WCAT denotes the concatenation of two words, and REPLICATE makes
a list consisting of a value replicated a specied number of times, which are pre-
dened functions in HOL.
1
We adopt the convention that bits are indexed from the right hand side.
4 FORMALIZING FIXED-POINT ARITHMETIC IN HOL 17
4.4 Exception Handling
Operations on xed-point numbers can signal exceptions as described in Sec-
tion 3.2. These are declared as a new HOL data type.
`
def
Exception = no_except j overflow j invalid j loss_sign
where no except is reserved for the case without exception.
Five overow modes are also represented via an enumerated type denition.
`
def
overflow_mode = SAT j SAT_ZERO j SAT_SYM j WRAP j WRAP_SM
According to the denition of overow modes in Section 3.2.3 for Saturation, if
the number is greater thanMAX or less thanMIN, we return topfxp and bottomfxp,
as the closest representable values to the right result, respectively. For Saturation
to Zero overow, we will return zero in any case. For Symmetrical Saturation, if
the number is greater thanMAX, we return topfxp. If the number is less thanMIN,
we return the two's complement of the maximum value, dened by the function
minustopfxp for signed, and bottomfxp for unsigned numbers, respectively. For
Wrap-around and Sign magnitude, we must rst convert the real number to a
binary format. Then we discard the extra bits according to the output attributes,
and saturate the required bits based on the parameter n bits. The details are
dened as functions WRAP AROUND and WRAP AROUND SM. Therefore, we
dene the xed-point overow function in HOL as follows:
`
def
fxp_overflow X o_mode n_bits x =
if (x > MAX X) then
if (o_mode = SAT) then topfxp X
else if (o_mode = SAT_ZERO) then
fxp (WORD (REPLICATE (wordlength X) F),X)
else if (o_mode = SAT_SYM) then topfxp X
else if (o_mode = WRAP) then
WRAP_AROUND X n_bits x
else WRAP_AROUND_SM X n_bits x
else if (x < MIN X) then
if (o_mode = SAT) then bottomfxp X
else if (o_mode = SAT_ZERO) then
fxp (WORD (REPLICATE (wordlength X) F),X)
else if (o_mode = SAT_SYM) then
if (is_unsigned X) then bottomfxp X
else minustopfxp X
else if (o_mode = WRAP) then
WRAP_AROUND X n_bits x
else WRAP_AROUND_SM X n_bits x
else Null
4 FORMALIZING FIXED-POINT ARITHMETIC IN HOL 18




Null = @a. : (Isvalid a)
Note that if the number is in the representable range of the given attributes,
i.e. its value is neither greater than MAX nor less than MIN, then the overow is
meaningless and Null will be returned as the result.
4.5 Quantization
Fixed-point quantization takes an innitely precise real number and converts it
into a xed-point number. Seven quantization modes are specied in Section 3.2.2,




RND j RND_ZERO j RND_MIN_INF j RND_INF j RND_CONV j TRN j TRN_ZERO
Then we dene the xed-point quantization operation by a function, which is
dened case by case on the quantization modes as follows:
`
def
fxp_quantize X q_mode x =
if (q_mode = RND) then
closest value ( a. value a  x)
fa j (Isvalid a) ^ (Attrib a = X)g x
else if (q_mode = RND_ZERO) then
closest value ( a. abs (value a)  abs x)
fa j (Isvalid a) ^ (Attrib a = X)g x
else if (q_mode = RND_MIN_INF) then
closest value ( a. value a  x)
fa j (Isvalid a) ^ (Attrib a = X)g x
else if (q_mode = RND_INF) then
closest value
( a. (if 0  x then value a  x else value a  x))
fa j (Isvalid a) ^ (Attrib a = X)g x
else if (q_mode = RND_CONV) then
closest value ( a. LSB (String a) = F)
fa j (Isvalid a) ^ (Attrib a = X)g x
else if (q_mode = TRN) then
closest value ( a. T)
fa j (Isvalid a) ^ (Attrib a = X) ^ (value a  x)g x
else closest value ( a. T)
fa j (Isvalid a) ^ (Attrib a = X) ^
(abs (value a)  abs x)g x
4 FORMALIZING FIXED-POINT ARITHMETIC IN HOL 19
The xed-point quantization function takes as arguments a real number, a
quantization mode, and an output attributes, and returns the corresponding xed-
point number. Similar to the oating-point case [15], its denition is based on the
following predicate meaning that a is an element of the set s that provides a best
approximation to x, assuming a valuation function v :
`
def
is_closest v s x a =
((a IN s) ^ 8b. (b IN s) =) (abs (v a   x)  abs (v b   x)))
However, we still need to dene a function that picks out a best approximation
in case there are more than one closest number, based on a given property like
even. This can be done in HOL as follows:
`
def
closest v p s x =
@a. ((is_closest v s x a) ^
((9b. (is_closest v s x b) ^ (p b)) =) (p a)))




fxp_round X o_mode q_mode n_bits x =
if (x > MAX X _ x < MIN X) then
((fxp_overflow X o_mode n_bits x),overflow)
else ((fxp_quantize X q_mode x),no_except)
where fxp overow is the xed-point overow function as dened in the previous
section and supports all overow modes, and fxp quantize is the xed-point quan-
tization function that supports all quantization modes. The xed-point rounding
function takes as argument a real number, an output attributes, the quantization
and overow modes, and the number of saturated bits. It returns a xed-point
number and an exception ag. The function rst checks for overow, and in case of
overow returns the result based on the overow mode, and sets the exception ag
to overow. Otherwise, it performs the quantization based on the quantization
mode, and sets the exception ag to no except.
4.6 Fixed-Point Arithmetic Operations
Fixed-point arithmetic operations such as addition or multiplication take two xed-
point input operands and store the result into a third. The attributes of the inputs
and output need not match one another. Both unsigned and two's complement
inputs and output are allowed. The result is formatted into the output as specied
5 VERIFICATION OF FIXED-POINT OPERATIONS 20
by the output attributes and by the overow and loss of precision mode param-
eters. In our formalization, we rst deal with exceptional cases such as invalid
operation and loss of sign. If any of the input numbers is invalid, then the result
is Null and the exception ag invalid is raised. If the result is negative but the
output is unsigned then zero is returned and the exception ag loss sign is raised.
Also in the case of division by zero, the output value is forced to zero and the
invalid ag is raised. Otherwise, we take the real value of the input arguments,
perform the operation as innite precision, then quantize the result according to
the desired quantization and overow modes. Formally, the operations for addi-
tion, subtraction, multiplication, and division are dened as follows:
`
def
fxpAdd X o_mode q_mode n_bits a b =
if :(Isvalid a ^ Isvalid b) then (Null,invalid)
else if (value a + value b < 0 ^ is_unsigned X) then
(fxp (WORD (REPLICATE (wordlength X) F),X),loss_sign)
else fxp_round X o_mode q_mode n_bits (value a + value b)
`
def
fxpSub X o_mode q_mode n_bits a b =
if :(Isvalid a ^ Isvalid b) then (Null,invalid)
else if (value a   value b < 0 ^ is_unsigned X) then
(fxp (WORD (REPLICATE (wordlength X) F),X),loss_sign)
else fxp_round X o_mode q_mode n_bits (value a   value b)
`
def
fxpMul X o_mode q_mode n_bits a b =
if :(Isvalid a ^ Isvalid b) then (Null,invalid)
else if (value a * value b < 0 ^ is_unsigned X) then
(fxp (WORD (REPLICATE (wordlength X) F),X),loss_sign)
else fxp_round X o_mode q_mode n_bits (value a * value b)
`
def
fxpDiv X o_mode q_mode n_bits a b =
if :(Isvalid a ^ Isvalid b) then (Null,invalid)
else if (value b = 0) then
(fxp (WORD (REPLICATE (wordlength X) F),X),invalid)
else if (value a / value b < 0 ^ is_unsigned X) then
(fxp (WORD (REPLICATE (wordlength X) F),X),loss_sign)
else fxp_round X o_mode q_mode n_bits (value a / value b)
5 Verication of Fixed-Point Operations
According to the discussion in Section 4.3, each xed-point number has a cor-
responding real number value. The correctness of a xed-point operation can be
specied by comparing its output with the true mathematical result, using the val-
uation function value that converts a xed-point to an innitely precise number.
5 VERIFICATION OF FIXED-POINT OPERATIONS 21
For example, the correctness of a xed-point adder fxpAdd is specied by compar-
ing it with its ideal counterpart +. That is, for each pair of xed-point numbers
(a,b), we compare value (a)+ value (b) and value (fxpAdd (a,b)). In other words,











value (a) , value (b)
Figure 4: Correctness Criteria for Fixed-Point Addition
For this purpose we dene the error resulting from quantizing a real number
to a xed-point value as follows:
`
def
fxperror X o_mode q_mode n_bits x =
value (FST (fxp_round X o_mode q_mode n_bits x))   x
and then establish the correctness theorems for all four xed-point arithmetic
operations.
Theorem 1: FXP_ADD_THM
` (Isvalid a) ^ (Isvalid b) ^ validAttr (X) =)
(Isvalid (FST (fxpAdd (X) o_mode q_mode n_bits a b))) ^
(value (FST (fxpAdd (X) o_mode q_mode n_bits a b)) =
value (a) + value (b) +
(fxperror (X) o_mode q_mode n_bits (value (a) + value (b))))
Theorem 2: FXP_SUB_THM
` (Isvalid a) ^ (Isvalid b) ^ validAttr (X) =)
(Isvalid (FST (fxpSub X o_mode q_mode n_bits a b))) ^
(value (FST (fxpSub X o_mode q_mode n_bits a b)) =
value (a)   value (b) +
(fxperror X o_mode q_mode n_bits (value a   value b)))
Theorem 3: FXP_MUL_THM
` (Isvalid a) ^ (Isvalid b) ^ validAttr (X) =)
(Isvalid (FST (fxpMul X o_mode q_mode n_bits a b))) ^
(value (FST (fxpMul X o_mode q_mode n_bits a b)) =
(value a * value b) +
(fxperror X o_mode q_mode n_bits (value a * value b)))
5 VERIFICATION OF FIXED-POINT OPERATIONS 22
Theorem 4: FXP_DIV_THM
` (Isvalid a) ^ (Isvalid b) ^ validAttr (X) =)
(Isvalid (FST (fxpDiv X o_mode q_mode n_bits a b))) ^
(value (FST (fxpDiv X o_mode q_mode n_bits a b)) =
(value a / value b) +
(fxperror X o_mode q_mode n_bits (value a / value b)))
The theorems are composed of two parts. The rst part is about the validity of
the xed-point arithmetic operation output and states that if the input xed-point
numbers and the output attributes are valid then the result of the xed-point op-
eration is valid. The second part of the theorem relates the result of the xed-point
arithmetic operations to the real result based on the corresponding error function.
To prove these main theorems, a number of lemmas have been established. We
rst proved lemmas concerning the approximation of a real number with a xed-
point number. We proved that in a nite non-empty set of xed-point numbers,
we can nd the best approximation to a real number based on a given valuation
function (Lemma 1 ).
Lemma 1: FXP_IS_CLOSEST_EXISTS
` FINITE (s) =) :(s = EMPTY) =) 9 (a: fxp). is_closest v s x a
Then, we proved that the chosen best approximation to a real number satisfying
a property p from a nite and non-empty set of xed-point numbers is unique
(Lemma 2 ), and is itself a member of the set (Lemma 3 ), and is itself the best
approximation of the real number (Lemma 4 ).
Lemma 2: FXP_CLOSEST_IS_EVERYTHING
` FINITE (s) =) :(s = EMPTY) =)
is_closest v s x (closest v p s x) ^
((9b. is_closest v s x b ^ p b) =) p (closest v p s x))
Lemma 3: FXP_CLOSEST_IN_SET
` FINITE (s) =) :(s = EMPTY) =) (closest v p s x) IN s
Lemma 4: FXP_CLOSEST_IS_CLOSEST
` FINITE (s) =) :(s = EMPTY) =) is_closest v s x (closest v p s x)
Finally, we proved that the chosen best approximation to a real number sat-
isfying a property p from the set of all valid xed-point numbers with a given
attributes is itself a valid xed-point number (Lemma 5 ).
Lemma 5: IS_VALID_CLOSEST
` (validAttr X) =)
Isvalid (closest v p fa j Isvalid a ^ ((Attrib a) = X)g x)
5 VERIFICATION OF FIXED-POINT OPERATIONS 23
Besides, we proved that the set of all valid xed-point numbers with a given
attributes is nite (Lemma 6 ).
Lemma 6: FINITE_VALID_ATTRIB
` FINITE fa j Isvalid a ^ (Attrib a = X)g
The proof of this lemma is a bit complicated. For this purpose we made use of
some built-in theorems about nite sets in the HOL pred sets library [28]. Among
these are the two fundamental theorems FINITE EMPTY and FINITE INSERT,
which state that the empty set is indeed nite and the insertion of an element
to a nite set constructs a nite set. Other theorems state that the union of
two nite sets (FINITE UNION ), the image of a function on a nite set (IM-
AGE FINITE ), a singleton set
2
(FINITE SING), the cross combination of two
nite sets (FINITE CROSS ), and any subset of a nite set (SUBSET FINITE )
is itself a nite set. Using these theorems together with the denition of a valid
xed-point number helped us to break down the proof of the niteness of all valid
xed-point numbers to the proof of niteness of the set of all Boolean words with a
given word length (WORD FINITE ) and the set of all natural numbers less than
a given value (FINITE COUNT ). The last lemmas are proved by induction on the
word length of the Boolean word and the maximum limit of the natural numbers,
respectively.
We also proved that the set of all valid xed-point numbers is nonempty
(Lemma 7 ).
Lemma 7: IS_VALID_NONEMPTY
` (validAttr X) =) :(fa j Isvalid a ^ (Attrib a = X)g = EMPTY)
Finally, we proved that the result of quantizing a real number, which is in
the range representable by a given valid attributes, is a valid xed-point number
(Lemma 8 ).
Lemma 8: IS_VALID_QUANTIZATION
` (validAttr X) =) Isvalid (FST (fxp_round X o_mode q_mode n_bits x))
The validity of the quantization directly implies validity of the xed-point op-
eration output, and this completes the proof of the rst parts of the theorems. The
second parts of the theorems are proved using the properties of the real arithmetic
in HOL and rewriting with the denitions of the fxpAdd, fxpSub, fxpMul, fxpDiv,
and fxperror functions.
The second main theorem on xed-point error analysis concerns bounding the
quantization error. The error can be absolutely quantied as follows:
2
a set that contains precisely one element.
5 VERIFICATION OF FIXED-POINT OPERATIONS 24
Theorem 5: FXP_ERROR_BOUND_THM
` (validAttr X) ^ :(x > MAX (X)) ^ : (x < MIN (X)) =)
abs (fxperror X o_mode q_mode n_bits x)  inv (&2 pow fracbits X)
According to this theorem, the error in quantizing a real number which is in
the range representable by a given set of attributes X is less than the quantity
1 = 2
fracbits (X)
. This theorem is valid for all xed-point quantization modes.
However, for RND, RND ZERO, RND MIN INF, RND INF, and RND CONV
modes, which quantize to the nearest representable value, the error can be bounded
to 1 = 2
(fracbits (X)+1)
by extending the theorem.
To explain the theorem, we consider the following fact that relates the denition
of the xed-point numbers to the rationals.
An N -bit binary word, when interpreted as an unsigned xed-point number,
can take on values from a subset P of the non-negative rationals given by
P = fp=2
b
j 0  p  2
N
  1; p 2 Zg (6)





 p  2
N 1
  1; p 2 Zg (7)
Note that P contains 2
N
elements and b represents the fractional bits in each case.
Based on this fact, we can depict the range of values covered for each case as













































Figure 5: Fixed-Point Values on the Real Axis
Thereafter, the representable range of xed-point numbers is divided into 2
N
equispaced quantization steps with the distance between two successive steps equal
to 1 = 2
b
. Suppose that x 2 R is approximated by a xed-point number a. The
position of these values are labeled in Figure 5. The error j x   a j is hence less
than the length of one interval, or 1 = 2
b
, as mentioned in the second theorem.
5 VERIFICATION OF FIXED-POINT OPERATIONS 25
In HOL, we rst proved that the quantization result is the nearest value to
a real number and the corresponding error is minimum compared to the other
xed-point numbers (Lemma 9 ).
Lemma 9: FXP_ERROR_AT_WORST_LEMMA
` (validAttr X) ^ :(x > MAX (X)) ^ :(x < MIN (X)) ^
(Isvalid a) ^ (Attrib a = X) =)
abs (fxperror X o_mode q_mode n_bits x)  abs (value a   x)
Then we proved that each representable real value x can be surrounded by two
successive rational numbers (Lemma 10 ).
Lemma 10: FXP_ERROR_BOUND_LEMMA1
` (validAttr X) ^ :(x > MAX (X)) ^ :(x < MIN (X)) =)
9k. (k < 2 EXP wordlength X) ^ (&k / (&2 pow fracbits X)  x) ^
(x < (&(SUC k) / (&2 pow fracbits (X))))
Also we proved that the dierence between the real number and the surround-




` (validAttr X) ^ :(x > MAX (X)) ^ :(x < MIN (X)) =)
9k. (k  2 EXP wordlength X) ^
abs (x   &k / (&2 pow (fracbits (X))))  inv (&2 pow (fracbits (X)))
Finally, we proved that for each real value we can nd a xed-point number
with the required error characteristics (Lemma 12 ).
Lemma 12: FXP_ERROR_BOUND_LEMMA3
` (validAttr X) ^ :(x > MAX (X)) ^ :(x < MIN (X)) =) 9(w: bool word).
abs (value (fxp (w,X))   x)  inv (&2 pow (fracbits X)) ^
(WORDLEN w = wordlength X)
Since the quantization produces the minimum error as stated in Lemma 9,
the proof of the second main theorem (Theorem 5 ) is a direct consequence of
Lemma 12. In these proofs, we have treated the case of signed and unsigned
numbers separately since they have dierent denitions for MAX, MIN, and value
functions. For signed numbers a special attention needs also to be paid to deal
with negative numbers.
6 APPLICATION WITH SPW 26
6 Application with SPW
In this section we demonstrate how to apply the formalization of xed-point arith-
metic presented in the previous sections for the verication of the transition from
oating-point to xed-point algorithmic levels. We have chosen SPW as applica-
tion tool and the case of an Integrator as an example circuit. A digital integrator
is a discrete time system that transforms a sequence of input numbers into another
sequence of output, by means of a specic computational algorithm. To describe




g, and a denote the
input sequence, output sequence, and constant coeÆcient of the integrator, respec-








Thereafter, the output sequence at time t is equal to the input sequence at time



























Figure 6: SPW Design of an Integrator
Figure 6 shows the SPW design of an integrator. The integrator is rst de-
signed and simulated using the SPW predened oating-point blocks and param-
eters (Figure 6 (a)). The design is composed of an adder (M1), a multiplier by
constant (M2), and a delay (M3 ) block, together with signal source (M4 ) and sink
(M5 ) elements. The input signal, the output signal, and the output of the adder
6 APPLICATION WITH SPW 27
and multiplier blocks are labeled by IN', OUT', S1', and S2', respectively. Figure
6 (b) shows the converted xed-point design in which each block is replaced with
the corresponding xed-point block (M1';M2';M3';M4';M5'). Fixed-point blocks
are shown by double circles and squares to distinguish them from the oating-
point blocks. The attributes of all xed-point block outputs are set to (64; 31; t)
to ensure that overow and quantization do not aect the system operation. The
corresponding xed-point signals are labeled by IN", OUT", S1", and S2".
In HOL, we rst model the design at each level as predicates in higher-order










































































where X is the oating-point format. In these denitions, we have used available
formalization of oating-point arithmetic in HOL [15]. Floating-point data types
are stored in SPW in the standard IEEE 64 bit double precision format.




















































































= FST (fxpAdd X
0











= FST (fxpMul X
0





6 APPLICATION WITH SPW 28
In the next step, we describe each design as a dierence equation relating the







































The following lemmas ensure that the implementation at each level satises
the corresponding specication.
Lemma 13: FLOAT_INTEGRATOR_IMP_SPEC































Now we assume that the oating-point and xed-point input sequences are the










t = FST (fxp_round X
0
o_mode q_mode n_bits (IN t))
where round is the oating-point rounding function, and To nearest is the corre-
sponding mode for rounding to nearest oating-point number [15]. We also make
some other assumptions on niteness and validity of oating-point and xed-point
inputs, coeÆcients, and intermediate results, in order to have nite and valid nal
outputs. Using these assumptions and based on the theorems FXP ADD THM
and FXP MUL THM (Section 5) and the corresponding ones in oating-point
theory [15], we prove the following theorem concerning the error between the real
values of the oating-point and xed-point precision integrator output samples.
Theorem 6: INTEGRATOR_THM






























(t   1)) +





(t   1))) +
error (Val (IN
0












(t   1))) +










(t   1)))  
fxperror X
0
o_mode q_mode n_bits (IN (t   1))
where Val is the oating-point valuation function, and error is the oating-point
rounding error function [15]. According to Theorem 6, for a valid and nite set of
input and output sequences at time (t - 1) to the integrator design at the oating-
point and xed-point levels, we can have nite and valid outputs at time t, and
the dierence in the real values corresponding to these output samples can be ex-
pressed as the dierence in input and output values multiplied by the correspond-
ing coeÆcients, taking into account the eects of nite precision in coeÆcients and
arithmetic operations. To nd a constant upper bound for the dierence between
the outputs, we use Theorem 5 on the xed-point error quantication. Similarly,
for the oating-point error bound analysis we proved the following lemma:
Lemma 15: ERROR_BOUND_NORM_STRONG_NORMALIZE
` normalizes X x =)
9 j. abs (error x)  (2 pow j / 2 pow (bias X + fracwidth X))
where normalizes denes the criteria for an arbitrary real number to be in the
range of normalized oating-point numbers, bias denes the exponent bias in the
oating-point format which is a constant used to make the exponent's range non-
negative, and fracwidth extracts the fraction width parameter from the oating-
point format. According to Lemma 15, if the absolute value of a real number
is in the representable range of the normalized oating-point numbers with the
format X and located in the j 'th binade (the oating-point numbers between two




(bias X + fracwidth X)
. The lemma is proved based on the general oating-point
absolute error bound theorem developed in [15].
Finally, we proved the following theorem (Theorem 7 ) that bounds the output
error of the integrator design in the transition from the oating-point to xed-point
levels.
Theorem 7: INTEGRATOR_FP_TO_FXP_ERROR_BOUND_THM
















9 j1 j2 j3.
abs (Val (OUT
0
t)   value (OUT
00
t)) 
2 * abs (a) * M +
(2 pow j1 + 2 pow j2 + 2 pow j3) / 2 pow (bias X + fracwidth X) +




In the proof of this theorem, we have assumed that the real values of the
oating-point and xed-point integrator coeÆcients are equal (Val a' = value a"
= a), hence ignoring the eects of inaccuracies in the integrator coeÆcient. We
have also assumed that the oating-point and xed-point output values are bounded
to a constant value (M ). The parameters j1, j2, and j3 are related to the binades
in which the real valued arguments of the three oating-point error expressions in
Theorem 6 are located.
7 Conclusions
In this paper, we established the formalization of xed-point arithmetic in the
HOL theorem prover. Unlike oating-point arithmetic, there is no standard for
the xed-point counterpart. We hence dened in this paper a complete common
set of the xed-point arithmetic supported by most DSP tools, in particular SPW
and SystemC. We started rst by encoding the xed-point arithmetic in HOL con-
sidering dierent quantization and overow modes, as well as exception handling.
We then proved two main theorems stating that the operations on xed-point
numbers are closely related to the corresponding operations on innitely precise
values, considering some error. The error is bounded to a certain absolute value
which is a function of the output precision. We have also shown by an example how
these theorems can be used as a basis for analysis of the quantization errors in the
design of xed-point DSP subsystems. The formalization presented in this paper
can be considered as a complement to the oating-point formalizations which are
widely available in the literature. Based on the proposed xed-point formalization,
our immediate future work will focus on the verication of the transition from the
oating-point algorithmic level to hardware implementations for DSP applications.
References
[1] M. D. Aagaard and C. -J. H. Seger, \The Formal Verication of a Pipelined
Double-Precision IEEE Floating-Point Multiplier," In Proceedings Interna-
tional Conference on Computer Aided Design, pp. 7-10, San Jose, California,
USA, November 1995.
[2] G. Barrett, \Formal Methods Applied to a Floating Point Number System,"
IEEE Transactions on Software Engineering, SE-15 (5): 611-621, May 1989.
[3] C. Berg and C. Jacobi, \Formal Verication of the VAMP Floating Point Unit,"
In Correct Hardware Design and Verication Methods, LNCS 2144, pp. 325-
339, Springer-Verlag, 2001.
REFERENCES 31
[4] S. Beyer, C. Jacobi, D. Kroning, D. Leinenbach, and W. J. Paul, \Instantiating
Uninterpreted Functional Units and Memory System: Functional Verication
of the VAMP," In Correct Hardware Design and Verication Methods, LNCS
2860, pp. 51-65, Springer-Verlag, 2003.
[5] S. Boldo, M. Daumas, and L. Thery, \Formal Proofs and Computations in
Finite Precision Arithmetic," In Proceedings of the 11th Symposium on the
Integration of Symbolic Computation and Mechanized Reasoning, pp. 101-111,
Rome, Italy, September 2003.
[6] S. Boldo and M. Daumas, \Properties of Two's Complement Floating Point
Notations," Software Tools for Technology Transfer, 5 (2-3): 237-246, March
2004.
[7] Cadence Design Systems, Inc., \Signal Processing WorkSystem (SPW) User's
Guide," USA, July 1999.
[8] V. A. Carreno, \Interpretation of IEEE-854 Floating-Point Standard and Def-
inition in the HOL System," NASA TM-110189, September 1995.
[9] M. Cornea-Hasegan, \Proving the IEEE Correctness of Iterative Floating-Point
Square Root, Divide, and Remainder Algorithms," Intel Technology Journal,
Q2: 1-11, 1998.
[10] Y. -A. Chen and R. E. Bryant, \Verication of Floating Point Adders," In
Computer Aided Verication, LNCS 1427, pp. 488-499, Springer-Verlag, 1998.
[11] M. Daumas, L. Rideau, and L. Thery, \A Generic Library for Floating-Point
Numbers and Its Application to Exact Computing," In Theorem Proving in
Higher Order Logics, LNCS 2152, pp. 169-184, Springer-Verlag, 2001.
[12] M. J. C. Gordon and T. F. Melham, \Introduction to HOL: A Theorem
Proving Environment for Higher-Order Logic," Cambridge University Press,
1993.
[13] J. R. Harrison, \Constructing the Real Numbers in HOL," Formal Methods
in System Design, 5 (1/2): 35-59, 1994.
[14] J. R. Harrison, \A Machine-Checked Theory of Floating-Point Arithmetic,"
In Theorem Proving in Higher Order Logics, LNCS 1690, pp. 113-130, Springer-
Verlag, 1999.
[15] J. R. Harrison, \Floating-Point Verication in HOL Light: The Exponential
Function," Formal Methods in System Design, 16 (3): 271-305, 2000.
REFERENCES 32
[16] J. R. Harrison, \Formal Verication of Floating Point Trigonometric Func-
tions," In Formal Methods in Computer-Aided Design, LNCS 1954, pp. 217-
233, Springer-Verlag, 2000.
[17] J. R. Harrison, \Formal Verication of IA-64 Division Algorithms," In Theo-
rem Proving in Higher Order Logics, LNCS 1869, pp. 234-251, Springer-Verlag,
2000.
[18] The Institute of Electrical and Electronic Engineers, Inc., \IEEE, Standard
for Binary Floating-Point Arithmetic," ANSI/IEEE Standard 754, USA, 1985.
[19] The Institute of Electrical and Electronic Engineers, Inc., \IEEE, Standard
for Radix-Independent Floating-Point Arithmetic," ANSI/IEEE Std 854, USA,
1987.
[20] R. Kaivola and M. D. Aagaard, \Divider Circuit Verication with Model
Checking and Theorem Proving," In Theorem Proving in Higher Order Logics,
LNCS 1869, pp. 338-355, Springer-Verlag, 2000.




Floating-Point Multiplier," In Proceedings Design Automation and Test in
Europe Conference, pp. 20-27, Paris, France, March 2002.




4 Floating-Point Divider," Software Tools for Technology
Transfer, 4 (3): 323-334, 2003.
[23] H. Keding, M. Willems, M. Coors, and H. Meyr, \FRIDGE: A Fixed-Point
Design and Simulation Environment," In Proceedings Design Automation and
Test in Europe Conference, pp. 429-435, Paris, France, February 1998.
[24] M. Leeser and J. O'Leary, \Verication of a Subtractive Radix-2 Square Root
Algorithm and Implementation," In Proceedings International Conference on
Computer Design, pp. 526-531, Austin, Texas, USA, October 1995.
[25] Mathworks, Inc., \Simulink Reference Manual," USA, 1996.
[26] Mathworks, Inc., \Fixed-Point Blockset, For Use with Simulink, User's
Guide," USA, 2004.
[27] Mentor Graphics, Inc., \DSP Station User's Manual," USA, 1993.
[28] T. F. Melham, \The HOL pred sets Library," University of Cambridge, Com-
puter Laboratory, February 1992.
REFERENCES 33
[29] P. S. Miner, \Dening the IEEE-854 Floating-Point Standard in PVS," NASA
TM-110167, June 1995.
[30] P. S. Miner and J. F. Leathrum, \Verication of IEEE Compliant Subtractive
Division Algorithms," In Formal Methods in Computer-Aided Design, LNCS
1166, pp. 64-78, Springer-Verlag, 1996.
[31] J. S. Moore, T. Lynch, and M. Kaufmann, \A Mechanically Checked Proof
of the Correctness of the Kernel of the AMD5K86 Floating-Point Division
Algorithm," IEEE Transactions on Computers, 47 (9): 913-926, 1998.
[32] S. M. Mueller and W. J. Paul, \Computer Architecture. Complexity and
Correctness," Springer-Verlag, 2000.
[33] Open SystemC Initiative, \SystemC Language Reference Manual," USA,
2004.
[34] J. O' Leary, X. Zhao, R. Gerth, and C.-J.H. Seger, \Formally Verifying IEEE
Compliance of Floating-Point Hardware," Intel Technology Journal, Q1: 1-14,
1999.
[35] D. M. Russino, \A Case Study in Formal Verication of Register-Transfer
Logic with ACL2: The Floating-Point Adder of the AMD Athlon Processor,"
In Formal Methods in Computer-Aided Design, LNCS 1954, pp. 3-36, Springer-
Verlag, 2000.
[36] J. Sawada and R. Gamboa, \Mechanical Verication of a Square Root Algo-
rithm using Taylor's Theorem," In Formal Methods in Computer-Aided De-
sign, LNCS 2517, pp. 274-291, Springer-Verlag, 2002.
[37] Synopsys, Inc., \CoCentric
TM
System Studio User's Guide," USA, August
2001.
[38] University of Cambridge, \The HOL System Reference," Computer Labora-
tory, Cambridge, UK, March 2004.
[39] W. Wong, \Modeling Bit Vectors in HOL: The Word Library," In Higher
Order Logic and Its Applications, LNCS 780, pp. 371-384, Springer-Verlag,
1994.
