I. INTRODUCTION
T HE short-time Fourier transform (STFT) is a linear transform used to calculate the evolution of a signal over time, offering a tradeoff between spectral and temporal resolutions. Whereas the fast Fourier transform (FFT) is adequate for the study of stationary signals, i.e., those signals whose parameters do not vary over time such as sinusoids, the STFT is adequate for nonstationary signals. In this case, parameters such as amplitude, frequency, and phase can vary over time.
The STFT is mainly used in spectral analysis [1] - [3] , being a key element in many applications such as medical applications [4] - [6] , digital receivers [7] - [9] , and musical and audio signal analysis [10] - [12] .
Nowadays, there are two main approaches to calculate the STFT in hardware. The first one consists of using several FFT modules in parallel [1] . Each of these modules calculates all of the STFT frequencies at a certain time instant. The second approach obtains the values for each frequency independently following an iterative fashion [13] , [14] .
Although both approaches are useful for calculating the STFT, they have important drawbacks. On the one hand, the FFT-based STFT has a high cost in terms of hardware, as it needs the implementation of multiple FFTs in parallel. On the other hand, the iterative STFT generates an accumulative error that increases at each iteration.
This brief presents the feedforward STFT. This new approach has the advantage that it requires significantly less hardware than the FFT-based STFT, and at the same time, it does not generate accumulative errors like the iterative STFT. This brief is organized as follows. Section II reviews previous approaches for the STFT. Section III explains the basic principle in which the feedforward STFT is based. Section IV presents the feedforward STFT algorithm. Section V shows the proposed feedforward STFT architecture. Section VI presents the STFT for real-valued inputs. Finally, Section VI summarizes the main conclusion of this brief.
II. STFT REVIEW
In digital systems, the STFT of a discrete signal x[n] is defined as
where k = 0, 1 . . . , N − 1. In the equation, time and frequency are represented by the discrete variables n and k, respectively.
The STFT at a certain time n corresponds to the FFT of the sequence from samples x[n] to x[n + (N − 1)]. Thus, one approach to calculate the STFT is to use N FFT processors in parallel [1] , as shown in Fig. 1 . In this case, all of the processors start to calculate the FFT at the same time. Due to the delay of the buffer, the FFTs start with different samples and thus calculate the FFT at different time instants.
Considering that each FFT is implemented by a single-path delay feedback FFT [1] , each FFT processor has 2 log 2 N adders, log 2 N multipliers, and a total memory size of N . Therefore, the entire STFT has 2N log 2 N adders, N log 2 N multipliers, and a memory size of N 2 . Another approach consists of calculating the STFT for each frequency independently [13] , [14] . This is done by the iterative structure in Fig. 2 , based on the following: Therefore, at each time instant, each frequency is updated with the incoming value. In this case, the number of adders and multipliers is N , and the total memory size is 2N . The hardware cost is smaller in the iterative approach compared to the FFT-based one. However, the iterative approach has the drawback that the quantization error accumulates at each iteration [14] .
III. BASIC PRINCIPLE OF THE FEEDFORWARD STFT
The feedforward STFT is based on the radix-2 decimation in time (DIT) FFT. The DIT algorithm has the property that the STFT can reuse operations of consecutive FFTs as shown next.
The flow graph of a radix-2 DIT FFT for N = 16 points is shown in Fig. 3 . The numbers at the input represent the index of the input sequence x[n], whereas those at the output are the frequencies k of the output signal X [k] . In the flow graph, the input values are depicted in bit-reversed order, and the output frequencies are in natural order.
The flow graph consists of a series of n stages, s ∈ {1, . . . , n}, where additions, subtractions, and complex multiplications are calculated. Additions and subtractions come in pairs, forming a structure called butterfly. The flow graph in Fig. 3 assumes that the lower edge of each butterfly is always multiplied by −1. This −1 is usually not depicted in order to simplify the graphs.
The multiplications are represented by the numbers between the stages. Each number φ in between the stages indicates a multiplication by the twiddle factor
Let us imagine that we are calculating the STFT, and therefore, we have to calculate consecutive FFTs. Fig. 4 shows this case. The first FFT is calculated on samples x[0] to x [15] , the second FFT is calculated on samples x [1] to x [16] , and the third FFT is calculated on samples x [2] to x [17] . In each consecutive FFT, a new sample arrives at the input, and another sample is discarded. The second FFT discards sample x[0] and incorporates sample x [16] . Fig. 4 Regarding multiplications, the DIT algorithm has the particularity that multiplications can be reused for the STFT in the same way as the butterflies. Indeed, in Fig. 3 , the flow graph until stage 2 is formed by four equal parts, and the flow graph until stage 3 is formed by two equal parts. This shows the large number of operations that are repeated among consecutive STFTs.
IV. PROPOSED FEEDFORWARD STFT ALGORITHM for i = 1 : length(input), The proposed algorithm is separated in stages for an easier explanation. First, the input value x[i] is saved in the variable x00. The first stage consists of a memory b10 and the two variables x10 and x11. The memory implements a circular buffer of size N/2, addressed by the variable m1. Thus, when all of the memory has been filled, it starts to be filled from the beginning. The two variables x10 and x11 are the output of the butterfly of the first stage, whose inputs are the value in the buffer and the input signal.
Note that the first stage only calculates a butterfly for each incoming input sample. This agrees with Fig. 4 , where only the calculation of one butterfly is needed at the first stage. Likewise, two samples that are operated in the butterfly differ by N/2 samples. This is the reason why the buffer has a length of N/2.
The second stage includes a buffer of length N/4 and four variables x20 to x23 that save the results of the butterflies. Previous to this, the input x11 is multiplied by the twiddle factor W 2 8 = −j. After the butterflies are calculated, the buffers b20 and b21 are updated.
The third stage is similar to the second one. It only differs in the twiddle factors that are calculated and the number of butterflies.
After the third stage, the output of the STFT is obtained and stored in the variable "STFT." For this purpose, the outputs are assigned according to the bit reversal algorithm [15] .
In a general case for any N , the pseudocode of the feedforward STFT algorithm is shown in the following: The number of additions of the proposed algorithm is 2N −2, and the number of multiplications is N − 1. This can be compared to the use of an FFT-based STFT in software: the FFT-based STFT requires the calculation of one FFT for each incoming sample. In software, this means N log 2 N additions and N/2 log 2 N multiplications per incoming sample. Thus, the proposed algorithm has less additions and multiplications. Fig. 5 compares the proposed algorithm to the FFT-based STFT for the case of an eight-point STFT, using an Intel CORE i3 and MATLAB R2009b. The simulation runs 100 iterations of each algorithm. Fig. 5 shows the time of each iteration. In the proposed algorithm, the average time per iteration is 13.29 μs, whereas the average time for the FFT-based approach is 15.69 μs. This represents savings of 18% for the proposed algorithm compared to using an FFT-based approach.
V. PROPOSED FEEDFORWARD STFT ARCHITECTURE
The hardware implementation of the feedforward STFT is shown in Fig. 6 for the case of N = 16. It consists of four stages with butterflies, multipliers, and buffers. Boxed numbers represent the length of the buffers, whereas numbers close to the multipliers indicate the value φ of the twiddle factor. Note that all of the multipliers in the feedforward STFT multiply by a constant, which simplifies the design.
Like the feedforward STFT algorithm, the number of butterflies and multipliers increases with the stage. Specifically, the number of adders in stage s is 2 s , the number of constant multipliers is 2 s−1 , and the memory is N/2. This leads to a total of 2N − 2 adders, N − 1 multipliers, and a memory size of (N/2) log 2 N in the entire architecture.
From the DIT FFT algorithm, the feedforward STFT architecture inherits the property that the output of each stage s provides the result of a 2 s -point STFT. Table I compares the FFT-based, iterative, and feedforward STFTs. Compared to the FFT-based STFT, the feedforward STFT reduces the amount of adders, multipliers, and memory significantly. Compared to the iterative STFT, the feedforward STFT has comparable amount of adders and multipliers and has the advantage that it does not have accumulative error in the calculations. 
VI. STFT FOR REAL-VALUED INPUTS
The STFT version for real-valued inputs is derived from the proposed STFT by following the approach in [16] . This approach considers the property that, in a real-valued FFT,
. This allows for removing the calculation of part of the flow graph as shown in Fig. 7 . Specifically, X [12] is obtained from X [4] , X [14] and X [6] are obtained from X [2] and X [10] , and X [15] , X [7] , X [11] , and X [3] are obtained from X [1] , X [9] , X [5] , and X [13] , respectively.
The impact in the feedforward STFT algorithm is that certain lines of the algorithm can be removed. Specifically, the lines with a star ( ) in (4) do not need to be calculated. They correspond to the frequencies that can be obtained from their symmetric frequency.
With respect to the STFT architecture, a number of butterflies, buffers, and multipliers can be removed from the feedforward STFT in Fig. 6 , leading to the architecture in Fig. 8 . Specifically, the datapaths for the frequencies that can be obtained from their symmetric frequency are removed. Furthermore, in the real feedforward STFT in Fig. 8 , some of the datapaths are real, which simplifies the structure. This comes from the flow graph in Fig. 7 , where all signals are real until they reach the first complex multiplications, highlighted in boxes.
VII. CONCLUSION
This brief has presented the feedforward STFT. It reuses operations among consecutive time instances of the FFT calculations. At the algorithmic level, this results in less operations than calculating the STFT with the use of the FFT. At architectural level, the result is less operations than FFT-based STFTs, without the accumulative error of iterative STFTs.
The feedforward STFT can be adapted to calculate it over real-valued inputs, leading to further savings in calculations.
