Abstract-This paper proposes a framework for the incorporation of formal methods in the design flow of digital signal processing (DSP) systems in a rigorous way. In the proposed approach, DSP descriptions were modeled and verified at different abstraction levels using higher order logic based on the higher order logic (HOL) theorem prover. This framework enables the formal verification of DSP designs that in the past could only be done partially using conventional simulation techniques. 
I. INTRODUCTION

D
IGITAL system design is characterized by an ever increasing system complexity that has to be implemented within reduced time, resulting in minimum costs and short time-to-market. These characteristics call for a seamless design flow that allows to perform the design steps on the highest suitable level of abstraction. For most digital signal processing (DSP) systems, the design has to result in a fixed-point (FXP) implementation. This is due to the fact that these systems are sensitive to power consumption, chip size, and price per device. FXP realizations outperform floating-point (FP) realizations by far with regard to these criteria. Fig. 1 illustrates a general DSP design flow as used nowadays in leading industry design projects.
The design of DSP systems starts from an ideal real number specification. In the theoretical analysis of digital systems, we generally assume that signal values and system coefficients are represented in the real number system and expressed to infinite precision. This allows to ignore the effects of finite word lengths and fixed exponents and to abstract from all implementation details. When implemented in a special-purpose digital hardware or as a computer algorithm, we must represent signals and coefficients in some digital number system that must always be of finite precision. In this case, attention must be paid to the effects of using finite register lengths to represent all relevant design parameters [32] . Despite the advantages offered by digital networks, there is an inherent accuracy problem associated with DSP systems since the signals are represented by a finite number of bits and the arithmetic operations must be carried out with an accuracy limited by the finite word length of the number representation. Depending on the type of arithmetic used in the system algorithm, the type of quantization used to reduce the word length to a desired size, and the exact system structure used, one can generally estimate how system performance is affected by these finite precision effects. There are several types of arithmetics used in the implementation of digital systems. Among the most common are FP and FXP. At the FP and FXP levels, all operands are represented by a special format 0278-0070/$20.00 © 2006 IEEE or assigned a fixed word length and a fixed exponent, while the control structure and the operations of the ideal program remain unchanged. The transformation from real (numbers) to FP and FXP is quite tedious and error prone. On the implementation side, the FXP model of the algorithm has to be transformed/synthesized into the best-suited target description either using a hardware description language (HDL) or a programming language. Meeting the above (sometimes conflicting) requirements is a great challenge in any DSP design project. The above design process can be aided by a number of specialized CAD tools such as Signal Processing WorkSystem (SPW, Cadence) [10] , CoCentric (Synopsys) [11] , MatlabSimulink (Mathworks) [28] , and FRIDGE (Aachen UT) [26] . The conformance of the FXP implementation with respect to descriptions in FP or real algorithm on one hand, and register transfer (RT) and gate levels on the other hand, is verified by simulation techniques. Simulation, however, is known to provide partial verification as it cannot cover all design errors, especially for large systems. On the other hand, adopting formal verification in system design generally means using methods of mathematical proof rather than simulation to ensure the quality of the design, to improve the robustness of a design, and to speed up the development. The overall aim of this paper is to propose a general methodology for the formalization and verification of DSP descriptions at different abstraction levels using higher order logic. To this end, we adopt a shallow embedding for DSP descriptions in which we translate the intended meaning of design blocks into higher order logic and then complete the formal proof in the HOL [17] theorem proving environment. To the best our knowledge, this is the first time that formal methods are applied to DSP modeling and verification in such a rigorous way.
The rest of the paper is organized as follows. Section II describes the proposed DSP formal verification methodology. Section III presents a case study verification of fast Fourier transform (FFT) algorithms in HOL from real number specification to RTL implementation. Section IV discusses related work. Finally, Section V concludes the paper and outlines future research directions.
II. PROPOSED DSP VERIFICATION FRAMEWORK
In this paper, we propose a methodology for applying formal methods to the design flow of DSP systems in a rigorous way. The corresponding commutating diagram is shown in Fig. 2 . Thereafter, we model the ideal real specification of the DSP algorithms and the corresponding FP and FXP representations as well as the RT and gate level implementations as predicates in higher order logic. The overall methodology for the formal specification and verification of DSP algorithms will be based on the idea of shallow embedding of languages [6] using the HOL theorem proving environment [17] . In the proposed approach, we first focus on the transition from real to FP and FXP levels. For this, we make use of existing theories in HOL on the construction of real [18] and complex [21] numbers, the formalization of IEEE-754 standard [24] based FP arithmetic [19] , [20] , and the formalization of FXP arithmetic [5] . We use valuation functions to find the real values of the FP and FXP DSP outputs and define the error as the difference between these values and the corresponding output of the ideal real specification. Then, we establish fundamental lemmas on the error analysis of FP and FXP roundings and arithmetic operations against their abstract mathematical counterparts. Finally, based on these lemmas, we derive expressions for the accumulation of roundoff error in FP and FXP DSP algorithms using recursive definitions and initial conditions. While theoretical work on computing the errors due to finite precision effects in the realization of DSP algorithms with FP and FXP arithmetics has been extensively studied since the late 1960s [25] , this paper contains the first formalization and proof of this analysis using a mechanical theorem prover, here HOL. The formal results are found to be in good agreement with the theoretical ones.
After handling the transition from real to FP and FXP levels, we turn to HDL representation. At this point, we use well-known techniques to model the DSP design at the RTL level within the HOL environment. The last step is to verify this level using a classical hierarchical proof approach in HOL [29] . In this way, we hierarchically prove that DSP RTL implementation implies high-level FXP algorithmic specification, which has already been related to the FP description and the ideal real specification through error analysis. The verification can be extended, following a similar manner, down to gate level netlist either in HOL or using other commercial verification tools as depicted in Fig. 2 .
The process of specifying an HDL in higher order logic is commonly known as semantic embedding. There are two main approaches [6] : deep embedding and shallow embedding. In deep embedding, the abstract syntax of a design description is represented by terms, which are then interpreted by semantic functions defined in the logic that assign meaning to the design. With this method, it is possible to reason about classes of designs since one can quantify over the syntactic structures. However, setting up HOL types of abstract syntax and semantic functions can be very tedious. In a shallow embedding, on the other hand, the design is modeled directly by a formal specification of its functional behavior. This eliminates the effort of defining abstract syntax and semantic functions, but it also limits the proofs to functional properties. In this paper, since our main concern is to check the correctness of the designs based on their functionality, we propose shallow embedding for DSP descriptions: translate the intended meaning of DSP block designs as described in its documentation into HOL and then complete the formal proof in the HOL theorem prover.
A. Application With SPW
In this section, we demonstrate how the proposed methodology can be used for the verification of an Integrator designed in SPW. The SPW of Cadence [10] is an integrated framework for developing DSP and communications products. It graphically represents a system as a network of functional blocks and comes with a vast library of DSP blocks, and users can also add their own blocks or build intellectual property (IP) blocks by composition of primitive blocks. SPW provides all the tools needed to interactively capture, simulate, test, and implement a broad range of DSP designs. Typical design applications include digital communication systems, image processing, multimedia, radar systems, control systems, digital audio, and highdefinition television. SPW can be used to evaluate various architectural approaches to a design and to develop, simulate, and fine-tune algorithms. A design project in SPW typically consists of the same steps depicted in Fig. 1 . More details about SPW design flow and the application of our methodology with it can be found in [4] . To briefly illustrate our approach, we show next the application of our methodology on a simple integrator designed in SPW.
A digital integrator is a discrete time system that transforms a sequence of input numbers into another sequence of output by means of a specific computational algorithm. To describe the general functionality of a digital integrator, let {x t }, {w t }, and a denote the input sequence, output sequence, and constant coefficient of the integrator, respectively. Then the integrator can be specified by the difference equation
Thereafter, the output sequence at time t is equal to the input sequence at time t − 1, added to the output at time t − 1 multiplied by the integrator coefficient. Fig. 3 shows the SPW design of an integrator. The integrator is first designed and simulated using the SPW predefined FP blocks and parameters [ Fig. 3(a) ]. The design is composed of an adder (M 1), a multiplier by constant (M 2), and a delay (M 3) block, together with signal source (M 4) and sink (M 5) elements. The input signal, the output signal, and the output of the adder and multiplier blocks are labeled by IN , OUT , S1 , and S2 , respectively. Fig. 3(b) shows the converted FXP design in which each block is replaced with the corresponding FXP block (M 1 , M2 , M3 , M4 , M5 ). FXP blocks are shown by double circles and squares to distinguish them from FP blocks. The attributes of all FXP block outputs are set to (64, 31, t) to ensure that overflow and quantization do not affect system operation. The corresponding FXP signals are labeled by IN , OUT , S1 , and S2 . In HOL, we first model the design at each level as predicates in higher order logic. The predicates corresponding to the FP design in IEEE 64-bit double precision format are
The HOL description of the FXP implementation is
Fxp_Add_Block IN S2 S1
∧ Fxp_Delay_Block S1 OUT ∧ Fxp_Gain_Block OUT a S2 .
In the next step (shown at the bottom of the page), we describe each design as a difference equation relating the input and output samples according to (1) .
The following theorems (Theorems 1 and 2) ensure that the implementation at each level satisfies the corresponding specification. According to this theorem, for a valid and finite set of input and output sequences at time (t − 1) to the integrator design at FP and FXP levels, we can have finite and valid outputs at time t, and the difference in real values corresponding to these output samples can be expressed as the difference in input and output values multiplied by the corresponding coefficients, taking into account the effects of finite precision in coefficients and arithmetic operations. The functions Floaterror and Fxperror represent the errors resulting from rounding the real operation results to a FXP and FP number, respectively. These errors are already quantified using the theorems mentioned in [5] for FXP arithmetic and the corresponding theorems for error analysis in the FP case [20] . Next, we generated with SPW the VHDL code corresponding to the filter design and used Synopsys to synthesize the gate level netlist. The resulting codes are then translated into HOL notation and the corresponding correctness theorems established as follows (Theorems 4 and 5).
Here, the input and output signals IN and OUT are Boolean words. To relate them to the corresponding specifications in fixed-point and FP, we make use of the bijection functions Fxp [5] and Float [20] , respectively. In the proof of these theorems, we used the modular behavior of the circuit so that we proved separate lemmas for different modules such as adder, multiplier, and delay, and then used these lemmas in the proof of the original theorems.
Finally, using the obtained Theorems 1-5, we can easily deduce our ultimate theorem (Theorem 6) proving the correctness of the FP specification from gate-level implementation, taking into account the error analysis computed beforehand.
More details about this analysis to find the error bounds can be found in [5] . In the rest of the paper, we demonstrate in more detail how the error analysis and verification methodology presented in this section can be used for the verification of the FFT algorithms implemented in different canonical forms of realization. A similar discussion can be applied to other types of signal analysis algorithms.
III. CASE STUDY: FFT ERROR ANALYSIS AND VERIFICATION
FFT [8] , [12] is a highly efficient method for computing the discrete Fourier transform (DFT) coefficients of a finite sequence of complex data. Because of the substantial time saving over conventional methods [31] , FFT has found important applications in a number of diverse fields such as spectrum analysis, speech and optical signal processing, and digital filter design. FFT algorithms are based on the fundamental principle of decomposing the computation of the DFT of a
finite-length sequence of length N into successively smaller DFTs. The manner in which this principle is implemented leads to a variety of different algorithms, all with comparable improvements in computational speed. There are two basic classes of FFT algorithms for which the number of arithmetic multiplications and additions as a measure of computational complexity is proportional to N log N rather than N 2 as in conventional methods. The first, proposed by Cooley and Tukey [13] , called decimation-in-time (DIT), derives its name from the fact that in the process of arranging the computation into smaller transformations the input sequence (generally thought of as a time sequence) is decomposed into successively smaller subsequences. In the second general class of algorithms proposed by Gentleman and Sande [16] , the sequence of DFT coefficients is decomposed into smaller subsequences, hence its name decimation-in-frequency (DIF).
As our case study in this paper, we consider the formal verification of the DIT and DIF FFT algorithms. We used our methodology to derive expressions for the accumulation of roundoff error in FP and FXP FFT algorithms by recursive definitions and initial conditions, considering the effects of input quantization and inaccuracy in the coefficients. Based on the extensively studied theoretical work on computing the errors due to finite precision effects in the realization of FFT algorithms with FP and FXP arithmetics [25] , we perform a similar analysis using the HOL theorem proving environment. The formal results are found to be in good agreement with the theoretical ones. Finally, we prove hierarchically that the FFT RTL implementation implies the high-level FXP algorithmic specification that has already been related to the FP description and ideal real specification through error analysis.
A. Error Analysis of FFT Algorithms in HOL
In this section, the principal results for accumulation of error in FFT algorithms using HOL theorem proving are derived and summarized. For the most part, the following discussion is phrased in terms of the radix-2 algorithm. However, most of the ideas employed in the error analysis of radix-2 algorithms can be utilized in the analysis of other algorithms. In the following, we first analyze error in the DIF FFT algorithm. Then, we perform a similar analysis for the DIT FFT algorithm. In either case, we will first describe in detail the theory behind the analysis and then explain how this analysis is performed in HOL.
1) DIF FFT Algorithm:
The DFT of a sequence {x(n)}
is defined as [31] A
where
np are called twiddle factors. For simplicity, our discussion is restricted to the radix-2 FFT algorithm, in which the number of points N to be Fourier transformed satisfies the relationship N = 2 m , where m is an integer value. The results can be extended to radices other than 2. By using the FFT method, the Fourier coefficients {A(p)} N −1 p=0 can be calculated in m = log 2 N iterative steps. At each step, an array of N complex numbers is generated by using only the numbers in the previous array. To explain the FFT algorithm, let each integer p, p = 0, 1, 2, . . . , N − 1, be expanded into a binary form as
and let p * denote the number corresponding to the reverse bit sequences of p, i.e.,
p=0 denote the N complex numbers calculated at the kth step. Then, the DIF FFT algorithm [25] can be expressed as
and
Equation (5) 4 . There are three common sources of errors associated with the FFT algorithms [25] , namely: 1) input quantization: caused by the quantization of the input signal {x n } into a set of discrete levels; 2) coefficient accuracy: caused by the representation of the coefficients {w k (p)} by a finite word length; 3) roundoff accumulation: caused by the accumulation of roundoff errors at arithmetic operations.
Therefore, the actual array computed by using (5) , respectively. Then, we define the corresponding errors of the pth element at step k as
where e k (p) and e k (p) are the errors between the actual FP and FXP implementations and the ideal real specification, respectively. e k (p) is the error in the transition from FP to FXP.
In analyzing the effect of FP roundoff, the effect of rounding will be represented multiplicatively. Let * denote any of the arithmetic operations +, −, ×, /. It is known [14] , [36] that if p represents the precision of the FP format, then
The notation fl(.) is used to denote that the operation is performed using FP arithmetic. The above theorem relates the FP arithmetic operations such as addition, subtraction, multiplication, and division to their abstract mathematical counterparts according to the corresponding errors.
While the rounding error for FP arithmetic enters into the system multiplicatively, it is an additive component for FXP arithmetic. In this case, the fundamental error analysis theorem for FXP arithmetic operations against their abstract mathematical counterparts can be stated as (11) and fracbits is the number of bits that are to the right of the binary point in the given FXP format X. The notation fxp(.) is used to denote that the operation is performed using FXP arithmetic. We have proved (10) and (11) as theorems in higher order logic within HOL. These theorems are proved under the assumption that there is no overflow or underflow in the operation result. This means that the input values are scaled so that the actual value of the result is located in the ranges defined by the maximum and minimum representable values of the given FP and FXP formats. The details can be found in [3] . In (5), {A k (p)} are complex numbers, so their real and imaginary parts are calculated separately. Let where q = p + 2 m−1−k and r = p − 2 m−1−k . On using prime and double prime to denote the calculated FP and FXP results, the real and imaginary parts of A k+1 (p) and A k+1 (p) are given, respectively, by
The corresponding error flowgraph showing the effect of roundoff error using the fundamental FP and FXP error analysis theorems according to (10) and (11), respectively, is given in Fig. 5 , which also indicates the order of calculation. 
, and λ k,p . Thereafter, the actual real and imaginary parts of the FP and FXP outputs A k+1 (p) and A k+1 (p), respectively, can be given explicitly by
The errors e k (p), e k (p), and e k (p) defined in (7)- (9) are complex and can be rewritten as From (13) and (16)- (20), we derive the following error analysis cases.
1) DIF FFT real to FP where f k (p) is given by (23) , shown at the bottom of the page.
2) DIF FFT real to FXP
where f k (p) is given by (25) , shown at the bottom of the next page.
where f k (p) and f k (p) are given by (23) and (25) .
The accumulation of roundoff error is determined by the recursive equations (22)- (26), with initial conditions given by (21) .
2) DIT FFT Algorithm:
p=0 denote the N complex numbers calculated at the kth step. Then the DIT FFT algorithm [27] can be expressed as
, where
Equation (27) is carried out for
, where p and p * are expanded and defined as in (7) and (8), respectively. It can be shown [13] that at the last step {A m (p)} N −1 p=0 are the discrete Fourier coefficients in normal order. Specifically, A m (p) = A(p). Fig. 6 shows the signal flowgraph of the actual computation for the case N = 2 4 .
Similar to the discussion of error analysis of DIF FFT, we first rewrite (27) using real and imaginary parts as
where q = p + 2 k and r = p − 2 k . We also use prime and double prime to denote the calculated FP and FXP results as A k+1 (p) and A k+1 (p). Similarly, we can express the real and imaginary parts of A k+1 (p), B k+1 (p), and C k+1 (p), and A k+1 (p), B k+1 (p), and C k+1 (p), using the FP and FXP operations, respectively, i.e.,
The corresponding error flowgraph showing the effect of roundoff error using the fundamental FP and FXP error analysis theorems according to (10) and (11), respectively, is given in Fig. 7 , which also indicates the order of calculation.
The quantities 
, and β k,p . Thereafter, the actual real and imaginary parts of the FP and FXP outputs A k+1 (p) and A k+1 (p), respectively, can be given explicitly by (32) and (33), shown at the bottom of the next page.
From (29), (32) , and (33), we derive the following error analysis cases.
1) DIT FFT real to FP
where f k (p) is given by (35) , shown at the bottom of the page.
2) DIT FFT real to FXP
where f k (p) is given by
3) DIT FFT FP to FXP
where f k (p) and f k (p) are given by (35) and (37), respectively.
The accumulation of roundoff error is determined by the recursive equations (34)- (38), with the initial conditions given by (21) .
3) Effects of Input Quantization and Coefficient Inaccuracy:
The discussion presented in previous sections concerns only the roundoff accumulation effect. As mentioned before, there are two other common causes of error due to the finite word length in computing Fourier coefficients. They are the quantization of
the input data x(n) and the inaccuracy of the coefficients w k (p). The effect of the quantization of x(n) can be treated as follows. Let x (n) and x (n) be the FP and FXP quantized versions of x(n), respectively. Then, from the discussion in Section III-A1, we can write
where θ n and ξ n are the errors caused by FP quantization, and θ n and ξ n are the errors caused by FXP quantization in the input signal. The effect of (39) and (40) modifies the initial conditions as described in (21) to
It can be shown that with these modifications the final results of the mean square errors remain the same except for an addition term that is independent of p [25] .
Another cause for error that has been neglected in the treatment of the previous sections is the fact that the coefficients w k (p) can only be represented in finite accuracy. It is possible to analyze the effect of the inaccuracy of w k (p) as follows. Let U k (p) and V k (p) be the real and imaginary parts of w k (p) as defined in (13), respectively. Also, let U k (p) and U k (p) be the FP and FXP quantized version of U k (p), and V k (p) and V k (p) be the FP and FXP quantized version of V k (p). Then, from the discussion in Section III-A1, we can write (23), (25), (35) , and (37).
4) Error Analysis in HOL:
In HOL, we first constructed complex numbers on reals similar to [21] . We defined in HOL a new type for complex numbers to be in bijection with R × R. The bijections are written in HOL as complex: R 2 → C and coords: C → R 2 . We used convenient abbreviations for the real (Re) and imaginary (Im) parts of a complex number. We also defined arithmetic operations such as addition, subtraction, and multiplication on complex numbers. We overloaded the usual symbols (+, −, ×) for C and R. Furthermore, we defined, using recursive definition in HOL, expressions for the finite summation on complex numbers. Similarly, we constructed complex numbers on FP and FXP variables. We also defined rounding and valuation functions for FP and FXP complex numbers. Then, we defined the principal N -roots on unity (e −j2πn/N = cos(2πn/N ) − j sin(2πn/N )) and its powers as a complex number using the sine and cosine functions available in the transcendental theory of HOL reals library [18] . We specified expressions in HOL for expansion of a natural number into a binary form in normal and rearranged order according to (3) , (4), (6) , and (28). The above enables us to specify the FFT algorithms in real, FP, and FXP abstraction levels using recursive definitions in HOL as described in (5) and (27) . Then, we define the real and imaginary parts of the FFT algorithm, and powers of the principal N -roots on unity according to (12) . Later, we proved in separate lemmas that the real and imaginary parts of the FFT algorithm in real, FP, and FXP levels can be expanded as in (13) and (29) . Then, we proved lemmas to introduce an error in each of the arithmetic steps in real and imaginary parts of the FP and FXP FFT algorithms according to (16) , (17), (32) , and (33) . We proved these lemmas using fundamental error analysis lemmas for basic arithmetic operations [3] according to (10) and (11) . Then, we defined in HOL the error of the pth element of the FP and FXP FFT algorithms at step k, and the corresponding error in transition from FP to FXP, according to (7)- (9) . Thereafter, we proved lemmas to rewrite the errors as complex numbers using the real and imaginary parts according to (18)- (20) . Finally, we proved a set of lemmas to determine the accumulation of roundoff error in FP and FXP FFT algorithms by recursive equations and initial conditions according to (21)- (26) for DIF, and (34)- (38) for DIT FFT. A complete list of the derived HOL definitions and theorems can be found in [1] .
B. Radix-4 64-Point FFT Design Verification
In this section, we describe the application of the proposed approach for the verification in HOL of the transition from real, FP, and FXP specifications to RTL implementation of an FFT algorithm. We have chosen the case study of a radix-4 pipelined 64-point complex FFT core available as VHDL RTL model in the Xilinx Coregen library [37] . All proofs have been conducted in HOL, hence establishing correctness of the FFT design implementation with respect to its high-level algorithmic specifications. Fig. 8 shows the overall block diagram of the Radix-4 64-point pipelined FFT design. The basic elements are memories, delays, multiplexers, and dragonflies. In general, the 64-point pipelined FFT requires the calculation of three radix-4 dragonfly ranks. Each radix-4 dragonfly is a successive combination of a radix-4 butterfly with four twiddle factor multipliers. The FFT core accepts naturally ordered data on the input buses in a continuous stream, performs a complex FFT, and streams out DFT samples on the output buses in a natural order. These buses are, respectively, the real and imaginary components of the input and output sequences. An internal input data memory controller orders the data into blocks to be presented to the FFT processor. The twiddle factors are stored in coefficient memories. The real and imaginary components of complex input and output samples and the phase factors are represented as 16-bit 2's complement numbers. The unscrambling operation is performed using output bit-reversing buffer. To define the radix-4 64-point FFT algorithm [8] , [31] , we represent the indices p and n in (2) in base 4 (quaternary number system) as
It is easy to verify that as n 0 , n 1 , and n 2 take on all possible values in the range indicated, n goes through all possible values from 0 to 63 with no values repeated. This is also true for the frequency index p. Using these index mappings, we can express the radix-4 64-point FFT algorithm recursively as
The final result can be written as
Thus, as in the radix-2 algorithm, the results are in reversed order. Based on (48)-(50), and (51), we can develop a signal flowgraph for the radix-4 64-point FFT algorithm as shown in Fig. 9 , which is an expanded version of the pipelined implementation in Fig. 8 . The graph is composed of three successive radix-4 dragonfly stages, with each stage comprising 16 dragonflies.
From (48)- (51), we can express the input-output relationship of a radix-4 64-point FFT as
Equation (52) can be rewritten using real and imaginary parts as (53), shown at the bottom of the page.
The corresponding error flowgraph is given in Fig. 10 . Therefore, the actual real and imaginary parts of the FP and FXP outputs can be given as that shown in (54)-(57), shown at the bottom of the next page.
From (53)- (57), we derive the following error analysis cases. where f and f are the error functions that can be derived based on Fig. 10 .
1) Verification in HOL:
In HOL, we first modeled the RTL description of a radix-4 butterfly as a predicate in higher order logic. The block takes a vector of four complex input data and performs the operations to generate a vector of four complex output signals. The real and imaginary parts of the input and output signals are represented as 16-bit Boolean words. We defined separate functions in HOL for arithmetic operations such as addition, subtraction, and multiplication on complex 2's complement 16-bit Boolean words. Then, we built the complete butterfly structure using a proper combination of these primitive operations.
Thereafter, we described a radix-4 dragonfly block as a conjunction of a radix-4 butterfly and four 16-bit twiddle factor complex multipliers. Finally, we modeled the complete RTL description of the radix-4 64-point structure in HOL. The FFT block is defined as a conjunction of 48 instantiations of radix-4 dragonfly blocks. Proper time instances of the input and output signals are applied to each block, according to Fig. 9 .
Following similar steps, we described the radix-4 64-point FFT structures as FXP, FP, and real domains in HOL using the corresponding complex data types and arithmetic operations.
The formal verification of the radix-4 DIF FFT algorithm case study was performed based on the commutating diagram in Fig. 2 , in that we proved hierarchically that the FFT netlist implies the FFT RTL, and then proved that the FFT RTL description implies the corresponding FXP model. The proof of the FFT block is then broken down into the corresponding proof of the dragonfly block, which itself is broken down into the proofs of butterfly and primitive arithmetic operations. We
used the data abstraction functions described in Section II-A to convert a complex vector of 16-bit 2's complement Boolean words into the corresponding FXP vector. Then, we proved three theorems encompassing the error analysis of the radix-4 DIF FFT algorithm, as discussed in Section III. The first lemma represents the error between the real number specification and the FP specification. The second lemma represents the error between the real number and the FXP specifications. The third lemma represents the error between FP and FXP specifications. According to these lemmas, the FP and FXP implementations and the real specification of a radix-4 DIF FFT algorithm are related to each other based on the corresponding data abstraction and error analysis functions.
Finally, using the obtained theorems, we easily deduced our ultimate theorem proving the correctness of the real specification from the RTL implementation, taking into account the error analysis computed beforehand. A complete list of the derived HOL definitions and theorems can be found in [1] .
IV. RELATED WORK
Work related to our project can be classified in three groups, namely, using formal methods for error analysis, paper-andpencil error analysis of FFT algorithms, and formal verification of FFT designs.
A. Error Analysis in Formal Verification
Previous work on error analysis in formal verification was done by Harrison [20] , who verified the FP algorithms such as the exponential function against their abstract mathematical counterparts using the HOL light theorem prover. As the main theorem, he proved that the FP exponential function has a correct overflow behavior, and in the absence of overflow the error in the result is bounded to a certain amount. He also reported on an error in the hand proof mostly related to forgetting some special cases in the analysis. This error analysis is very similar to the type of analysis performed for DSP algorithms. The major difference, however, is the use of statistical methods and mean square error analysis for DSP algorithms, which is not covered in the error analysis of the mathematical functions used by Harrison. In this method, the error quantities are treated as independent random variables uniformly distributed over a specific interval depending on the type of arithmetic and the rounding mode. Then, error analysis is performed to derive expressions for the variance and mean square error. To perform such an analysis in HOL, we need to develop a mechanized theory on the properties of random variables and random processes. This type of analysis is not addressed in this paper and is a part of our future work.
Huhn et al. [23] proposed a hybrid formal verification method combining different state-of-the-art techniques to guide the complete design flow of imprecisely working arithmetic circuits starting at the algorithmic level down to the RT level. The usefulness of the method is illustrated with the example of discrete cosine transform algorithms. In particular, the authors have shown the use of computer algebra systems like Mathematica or Maple at the algorithmic level to reason about real numbers and to determine certain error bounds for the results of numerical operations. In contrast to [23] , we proposed an error analysis for DSP designs using the HOL theorem prover. Although computer algebraic systems such as Maple or Mathematica are much more popular and have many powerful decision procedures and heuristics, theorem provers are more expressive, more precise, and more reliable [22] . One option is to combine the rigour of the theorem provers with the power of computer algebraic systems as proposed in [22] .
B. Error Analysis of FFT Algorithms
Analysis of errors in FFT realizations due to finite precision effects has traditionally relied on paper-and-pencil proofs and simulation techniques. The roundoff error in using FFT algorithms depends on the algorithm, type of arithmetic, word length, and radix. For FFT algorithms realized with FXP arithmetic, the error problems have been studied extensively. For instance, Welch [35] presented an analysis of the FXP accuracy of the radix-2 DIT FFT algorithm. Thong and Liu [33] presented a general approach to the error analysis of various versions of the FFT algorithm when FXP arithmetic is used. While the roundoff noise for FXP arithmetic enters into the system additively, it is a multiplicative component in the case of FP arithmetic. This problem is analyzed first by Gentleman and Sande [16] , who presented an upper bound on the mean-squared error for FP DIF FFT algorithm. Weinstein [34] presented a statistical model for roundoff errors of the FP FFT. Kaneko and Liu [25] presented a detailed analysis of roundoff error in the FFT DIF algorithm using FP arithmetic. This analysis is later extended by the same authors to the FFT DIT algorithm [27] . Oppenheim and Weinstein [32] discussed in some detail the effects of finite register length on the implementations of digital filters and FFT algorithms.
In order to validate the error analysis, most of the above works compare the theoretical results with experimental simulation. In this paper, we showed how the above error analyses for the FFT algorithms can be mechanically performed using the HOL theorem prover, providing a superior approach to validation by simulation. Our focus was on the process of translating the hand proofs done in the 1960s and 1970s into equivalent proofs in HOL. The analysis we developed is mainly inspired by the work done by Kaneko and Liu [25] , who proposed a general approach to the error analysis problem of DIF FFT algorithm using FP arithmetic. Following a similar idea, we have extended this theoretical analysis for the DIT and FXP FFT algorithms. In all cases, good agreements between formal and theoretical results were obtained.
C. Formalization and Verification of FFT Algorithms
Related work on the formalization and mechanical verification of the FFT algorithm was done by Gamboa [15] using the ACL2 theorem prover. The author formalized the FFT as a recursive data-parallel algorithm using the powerlist data structure. He also presented an ACL2 proof of the correctness of the FFT algorithm by translating the hand proof taken from Misra's seminal paper on powerlists [30] into a mechanical proof in ACL2. In the same line, Capretta [9] presented the formalization of the FFT using the type theory proof tool Coq. To facilitate the definition of the transform by structural recursion, Capretta used the structure of polynomial trees, which is similar to the data structure of powerlists introduced by Misra. Finally, he proved its correctness and the correctness of the inverse Fourier transform (IFT). In another related work, Bjesse [7] described the verification of FFT hardware at the netlist level with an automatic combination of symbolic simulation and theorem proving using the Lava hardware development platform. He proved that the sequential pipelined implementation of the radix-4 DIT FFT is equivalent to the corresponding combinational circuit. He also proved that the abstract implementations of the radix-2 and radix-4 FFTs are equivalent for sizes that are an exponent of four.
While Gamboa [15] and Capretta [9] prove the correctness of the high-level FFT algorithm against the DFT, the verification of Bjesse [7] is performed at the netlist level. In contrast, our work tried to close this gap by formally specifying and verifying the FFT algorithm realizations at different levels of abstraction based on different data types. Besides, the definition used for the FFT in [15] and [9] is based on the radix-2 DIT algorithm. We cover both DIT and DIF algorithms and radices other than 2. The methodology we proposed in this paper is, to the best of our knowledge, the first project of its kind that covers the formal specification and verification of integrated FFT algorithms at different abstraction levels starting from real specification to FP and FXP algorithmic descriptions, down to RT and netlist gate levels.
V. CONCLUSION
In this paper, we described a methodology for the formal specification and verification of DSP system designs at different abstraction levels. We proposed shallow embedding of DSP descriptions at different levels in HOL. For the verification of the transition from FP to FXP levels, we proposed an error analysis approach in which we consider the effects of finite precision in the implementation of DSP systems. These include errors due to the quantization of input samples and system coefficients, and also roundoff accumulation in arithmetic operations. The verification from FXP to RTL and netlist levels is performed using traditional hierarchical verification in HOL. In this paper, we demonstrated our methodology using the case study of FFT algorithms. The approach covers the two canonical forms (DIT and DIF) of realization of the FFT algorithm using real, FP, and FXP arithmetic as well as their RT implementations, each entirely specified in HOL. We proved lemmas to derive expressions for the accumulation of roundoff error in FP and FXP designs compared to the ideal real specification. Then, we proved that FFT RTL implementation implies the corresponding specification at FXP level using classical hierarchical verification in HOL, hence bridging the gap between hardware implementation and high levels of mathematical specification. In this work, we have also contributed to the upgrade and application of established real, complex real, FP, and FXP theories in HOL to the analysis of errors due to finite precision effects, and applied them on the realization of FFT algorithms. Error analyses using theoretical paperand-pencil proofs did exist since the late 1960s while design verification is exclusively done by simulation techniques. We believe this is the first time that a complete formal framework has been proposed for the specification and verification of DSP algorithms at different levels of abstraction. The methodology presented in this paper opens new avenues in using formal methods for the verification of DSP systems as complement to traditional theoretical (analytical) and simulation techniques. We are currently investigating the verification of complex wired and wireless communication systems whose building blocks heavily make use of several instances of the FFT algorithms. As a future work, we also plan to extend the error analyses to cover worst case, average, and variance errors. Finally, we plan to link HOL with computer algebra systems to create a sound, reliable, and powerful system for the verification of DSP systems.
ACKNOWLEDGMENT
The experiments were carried out with CAD tools provided by the Canadian Microelectronics Corporation.
