This paper proposes an efficient reconfigurable hardware design for speech enhancement based on multi band spectral subtraction algorithm and involving both magnitude and phase components. Our proposed design is novel as it estimates environmental noise from speech adaptively utilizing both magnitude and phase components of the speech spectrum. We performed multi-band spectral subtraction by dividing the noisy speech spectrum into different non-uniform frequency bands having varying signal to noise ratio (SNR) and subtracting the estimated noise from each of these frequency bands. This results to the elimination of noise from both high SNR and low SNR signal components for all the frequency bands. We have coined our proposed speech enhancement technique as Multi Band Magnitude Phase Spectral Subtraction (MBMPSS). The magnitude and phase operations are executed concurrently exploiting the parallel logic blocks of Field Programmable Gate Array (FPGA), thus increasing the throughput of the system to a great extent. We have implemented our design on Spartan6 Lx45 FPGA and presented the implementation result in terms of resource utilization and delay information for the different blocks of our design. To the best of our best knowledge, this is a new type of design for speech enhancement application and also a first of its kind implementation on reconfigurable hardware. We have used benchmark audio data for the evaluation of the proposed hardware and the experimental results show that our hardware shows a better SNR value compared to the existing state of the art research works.
Introduction
Speech enhancement aims to improve the quality of speech in a noisy environment. The spectral subtraction technique is a well known technique for speech noise elimination, which was originally introduced by S. Boll [1] . An upgraded version was introduced by Berouti et al. [2] for the musical noise reduction. The general principle behind the spectral subtraction is to estimate noise from the magnitude spectrum, which then gets subtracted from the original signal keeping the phase part of the spectrum unchanged. This general spectral subtraction technique results to three kinds of error [3] viz. error in noise estimation, error due to ignoring the speech-noise cross term in magnitude spectrum and error due to noisy phase spectrum with clear magnitude spectrum in signal reconstruction. The performance of speech enhancement is put down due to these errors. These errors has been reported in [4] [5] [6] for speech enhancement and speech recognition methods. When SNR of the signal is high, the noisy phase is close to the clean phase and the above methods work properly. But, when SNR drops then the cross term errors are produced, and the phase of the noisy signal plays the more seeming role in the clean magnitude signals and affects the reconstruction process. Recently, real and imaginary modulation spectral subtraction for speech enhancement was introduced by Yi Zhang et al. [3] , where the subtraction procedure performed on both real and imaginary parts of the spectrum. Also the real world noise affects signal in various time intervals, which is also called colored noise. The real world noise spectrum are not flat like white Gaussian noise. The multi band spectral subtraction method for speech enhancement was introduced by Kamath [7] , where the spectrum was divided into several bands for efficient noise reduction. In [8] , design of multi band spectral subtraction was proposed based on the magnitude compensation and phase modification. In [9] , we can find a phase based dual microphone algorithm for robust speech enhancement. Plenty of research work based on spectral subtraction algorithm can be found in [11] [12] [13] [14] [15] [16] [17] . Speech enhancement based on hardware software co-design using FPGA platform can be found in [10] [18] [19] [20] .
In this paper, we have performed noise estimation from both magnitude and phase spectrum by dividing the whole noisy speech spectrum into different non-uniform linearly shaped frequency bands and then subtracted the estimated noise from each frequency bands with different SNR over subtraction factor values α. This correctly justifies that our proposed hardware design for speech enhancement performs well for both high SNR signal as well as low SNR signal with different frequency bands. The hardware execution can be carried out in two ways: (a) off the shelf Digital Signal Processors (DSPs) and (b) FPGAs. We have chosen FPGA as our target hardware as it gives the opportunity of parallel computing involving the configurable logic cells [21] and dedicated DSP blocks. This leads to faster execution of hardware tasks, satisfying our primary objective. We have used the Xilinx System Generator tool in the MATLAB/SIMULINK environment [22] to design and verify our hardware. Here, we convey the comparative experimental results of SNR performance of the proposed architecture against the Magnitude Spectral Subtraction (MSS) [1] , Magnitude Phase Spectral Subtraction (MPSS) [3] and Multi Band Magnitude Spectral Subtraction (MBMSS) [7] for different noisy signals, which clearly infers that our design yields better performance. We also convey the resource utilization and delay information of the proposed architecture. The major contributions of this work can be summarized as follows:
1. Proposal of a new speech enhancement method, relatively more robust as compared to the state of the art works (MSS [1] , MPSS [3] , MBMSS [7] ), this is indicated by the improved performance in terms of SNR.
2. FPGA based hardware design and implementation of the proposed speech enhancement methodology.
This paper is organized as follows. In section 2, a brief background of magnitude spectral subtraction and multi band spectral subtraction algorithms are presented; our proposed hardware design for speech enhancement architecture is presented in Section 3; in Section 4 hardware implementation and in Section 5 performance analysis are presented; concluedary remarks in Section 6.
Background
In this section we discuss some of the fundamental issues related to spectral subtraction technique that are extremely important to understand our work presented in the next subsequent sections.
Spectral Subtraction Algorithm
Spectral subtraction is a procedure for restoration of the power spectrum or the magnitude spectrum of a signal observed in additive noise, through subtraction of an estimate of the average noise spectrum from the noisy signal spectrum. The noisy signal in time domain is represented as:
where y(m), x(m) and n(m) are the signal, additive noise and the noisy signal respectively and m is the discrete time index. The frequency domain noisy signal model corresponding to equation (1) can be represented as:
Where Y (f ), X(f ) and N (f ) are the frequency domain signals corresponding to y(m), x(m) and n(m) respectively. The noise estimation filter calculates N (f ) from the noisy spectrum. The magnitude of N (f ) is calculated by its average value during non speech activity. Spectral error [14] comes from subtraction estimator. It reduces by simple modification like magnitude averaging, half wave rectification, residual noise reduction and additional signal attenuation during non speech activity. The discontinuities at the end point of the segment can be done by windowing of the signal and can be expressed as:
Windowing signal can be expressed in frequency domain as:
where the operator * denotes convolution. A scaled estimate of the magnitude spectra of the noise signalN ω (f ) is subtracted from the corresponding spectra of the noisy signal Y ω (f ) to estimate the clean voiceŜ ω (f ) ,
Noise signal is estimated and the frequency dependent subtraction factor α is included to compensate the overestimation of the instantaneous noise spectrum. γ = 1 for the magnitude spectral subtraction and γ = 2 for power spectral subtraction. The enhanced signal spectrum is obtained using the magnitude estimateŜ(f ) and phase φ(f ) of the corrupted input signal,Ŝ
Finally, the clean signal is obtained by the inverse fourier transform ofŜ(f ),
This general spectral subtraction method provides better results of the speech enhancement for high SNR signals compared to the low SNR signals. A combination of magnitude and phase spectral subtraction methods provides speech enhancement of both high and low SNR signals [3] .
Multi Band Spectral Subtraction Algorithm
Most of the real world noise are colored noise, which affects the signal at various time interval. Multi band spectral subtraction algorithm [7] provides subtraction over individual frequency bands for the better speech enhancement. The clean speech spectrum of equation (2) can be represented as:
where α is the over subtraction factor, which is the function of the segmental SNR. General spectral subtraction methods assume that the noise is affected uniformly and the over subtraction factor α is subtracted over the whole spectrum. In real world the noise is effected in random phenomenon. The colored noise affects the signal spectrum differently at various frequencies. So the segmental SNR values change at different frequency bands [7] . The change of estimated SNR value for four frequency bands is shown in Fig. 1 . This four frequency bands are linearly spaced. The speech spectrum is divided into different non overlapping bands and the subtraction procedure is done over each band independently. So, the enhanced signal spectrum of the ith frequency bands is,
where α i is the over subtraction factor of the ith frequency band and δ i is the tweaking factor of each ith band. b i and e i are the beginning and ending frequency bins of ith frequency band. The over subtraction factor α is directly depended on the segmental SNR of the signal and is calculated as:
Depending upon the SN R i value the over subtraction factor α i evaluated as:
The over subtraction factor has a control in subtraction for each of the frequency bands. The subtraction factor δ i provides an additional degree of control for each frequency bands.The value of δ i [7] is specified by the following equation:
Where f i is the upper frequency band and F S is the sampling frequency.
After the subtraction of the estimated noise from all the frequency bands with different segmental SNR, we create the enhanced frequency bands.
Proposed Hardware Design
From the above discussion we observe that the MSS technique enhances the high SNR signals, MBMSS technique enhances the high SNR signals for different frequency bands. We propose a novel MPMBSS technique, which enhances both high SNR and low SNR signals for the different frequency bands.
In our proposed design we have four principle blocks namely magnitude multi band separation block, magnitude noise estimation-subtraction block, phase multi band separation block and phase noise estimationsubtraction block. The magnitude and phase operations are executed in parallel. The proposed architecture is shown in Fig. 2 . From equation (9) we have,
Where α i is the over subtraction factor of the i th frequency bands. The magnitude and the phase spectrum of the signal is divided into different frequency bands and the subtraction is done on each frequency bands from the estimated noise of both magnitude and phase spectrum of the signal. The noise estimated from the magnitude and noise spectrum of the signal areN i ωmg(f ) andN i ωph(f ) respectively and it is subtracted from the noisy magnitude and phase spectrum with different frequency bands depending upon the α i .
Where |Ŝ i ωmg(f )| andŜ i ωph(f ) are the clean magnitude and phase spectrum of the i th frequency bands respectively. |Y i ωmg(f )| and Y i ωph(f ) are the noisy magnitude and phase spectrum of the i th frequency bands respectively. And α i is calculated from the compute SNR block.
The enhanced magnitude spectrum |Ŝ i ωmg(f )| and the enhanced phase spectrum |Ŝ i ωph(f )| of the signal are combined together at the time of reconstruction of the signal.
S m p(f ) is the enhanced speech signal of the noisy signal. We claim that the proposed design has an increased signal to noise ratio (SNR) as compared to the other existing state of the art architectures. The magnitude and phase operations are executed concurrently exploiting the parallel logic blocks of field programmable gate array (FPGA).
Hardware Implementation
The proposed architecture is implemented on the reconfigurable FPGA hardware. The time domain noisy speech signal is converted into the frequency domain signal by the Fast Fourier Transform (FFT) block. This signal is divided into magnitude and phase components using CORDIC ARCTAN DSP block. From the magnitude and phase spectrum, noise is estimated by the noise estimation block. The magnitude and phase spectrum are divided into four frequency bands [7] by multi band separation block and over subtraction factor (α i ) is calculated from each of the bands by using the SNR computation block. Magnitude and phase subtraction block subtracts the estimated noise from each of the frequency bands with different α i . Enhanced magnitude and phase spectrum of the different frequency bands are combined together by the adder block and thus generating the enhanced magnitude and phase spectrum of the signal. The enhanced phase signal passes through the CORDIC SINCOS to generate the real and imaginary form of the phase spectrum. These real and imaginary phases are combined with the enhanced magnitude spectrum using the multiplier block and is passed through the inverse FFT (IFFT) block to reconstruct the signal. Output of the IFFT block gets the enhanced speech signal. The block diagram of the proposed hardware design is shown in Fig. 3. 
Fast Fourier Transform
The real world noisy speech signals are in the time domain signals. To convert this time domain signal to frequency domain we require the fourier transform over the signal. Fast Fourier Transform (FFT) block is used to perform the fourier operation on the signal. The FFT block generates the real as well as the imaginary components of the signal. We have used the FFT block of the Xilinx system generator platform. The option input/output was chosen for the FFT to implement its pipelined versions. For the performance optimization of the FFT block 4 multiplier structures are used and the phase factor is set to 8. Also the signal is segmented on non overlapping window of 256 samples. The data is recorded in the 2 stages of the block RAM (BRAM). The interface of the FFT block is shown in Fig. 4a . The real and imaginary components are transfered to the magnitude and phase separation block.
Magnitude and Phase Separation Block
Output of the FFT block drives the CORDIC ARCTAN block where the real and imaginary signal are divided into their magnitude and phase format. The magnitude Y ω mg(f ) and the phase Y ω ph(f ) are passed through the magnitude noise estimation block and phase noise estimation block respectively.
In the cordic ARCTAN block [25] , architectural configuration is set to parallel mode for high throughput. The pipeline mode is set to maximum and the phase format is set to radians mode. The output width of this Fig. 4b . The magnitude and phase spectrum are fed to the magnitude noise estimation block and phase noise estimation block respectively in parallel, where the band separation, noise estimation and subtraction processes are done.
Noise Estimation Block
In the noise estimation blocks, noise is estimated during the first few samples where noise is only present. Our design is adaptive in nature with the only constraint that a few initial samples of the input signal for a duration of 1.25ms is only noise, which is a fair constraint for speech communication. The block diagram of the magnitude and phase noise estimation block is shown in Fig. 5a .
Magnitude spectrum and phase spectrum of the signal are passed to the magnitude noise estimation block 
Multi Band Separation Block
The magnitude spectrum and phase spectrum of the signal also pass through multi band separation block. The magnitude spectrum and phase spectrum is divided into four frequency bands, which are linearly spaced. Four registers store the sample value of the four frequency bands until the reset signals are disabled. The multi band separation block is shown in Fig. 5b .
There are four controller blocks (Fig. 5b ) that are used to divide the signal linearly into four frequency bands of the single signal. Each controller is connected to the enable pin of the registers. The reset pins of the first three register are enabled when the immediate next register enable pins are enabled. The fourth register have no reset pin, so the fourth register carry the rest of the signal. When the signal are in the first register by the controller1, the reset pin is disabled and the sample signal values are passed through the register and falls into the first subtraction block where the subtraction procedure is done. When controller2 becomes active then register2 is ready to accept signal and also register1 is in the reset mode by controller2 to avoid override the signals. When the register4 becomes active then rest of the signal passes through this register. If we require more frequency bands, then we need to connect more registers and controllers. But, we investigate in our design that four frequency bands of the signal give better throughout. This architecture is modeled in parallel configuration. After subtraction of each bands from the estimated noise spectrum we reconstruct the signal into a single band frequency. Here we used multi band adder block to reconstruct the four signals. But before addition of the four enhanced signals, same controllers and registers which have been used for the separation are used for the same time format of the original signals.
Signal to Noise Ratio Computation
For the proposed architecture over subtraction factor α may vary for different frequency bands and depends on the signal to noise ratio for each frequency bands. The variation of α is described in equation (11) . The architecture of calculating SNR is shown in Fig. 6a .
Maximum amplitude of signal and noise are calculated by the relational block, multiplexer and register block. The enable pin of the register is connected to the output of the controller to keep the same bandwidth of the divided signal. The controller blocks are the same which was used previously to separate frequency bands. Those four controllers are used to the SNR computation block to get the maximum amplitude of the signal and noise. The output of the multiplexer holds the maximum value for each sample comparison. The registers pass the final maximum value of the signal samples by the controller block. Maximum amplitude of signal and noise pass through a division block to get the SNR value. In our proposed architecture, we require four SNR computation block for magnitude spectrum and four for the phase spectrum to estimate SNR values. The over subtraction factor α i of each bands are calculated using the corresponding SNR values. Then subtraction done between each frequency bands and estimated noise with the different α i .
Subtraction Block
In the subtraction block, estimated noise spectrum with over subtraction factor is subtracted from the signal spectrum of each of the frequency bands. The enhanced spectrum of ith frequency band is,
The architecture of the proposed subtraction block is shown in Fig. 6b . A multiplier block is used to combine the estimated noise spectrum and α which is depended on the signal to noise ratio. The subtraction block is used to get the enhanced signal spectrum. The final enhanced spectrum is in the output of the multiplexer. Our proposed architecture requires four subtraction block for magnitude subtraction and four for the phase subtraction as shown in Fig. 3 .
Signal Reconstruction Block
Enhanced magnitude and phase spectrum are combined together to reconstruct the signal. Reconstruction of signal requires the inverse fourier transform over the magnitude and phase spectrum. The enhanced phase spectrum of the signal pass through the CORDIC SINCOS block [25] to get their real and imaginary format. Two multiplier blocks are used to multiply the enhanced magnitude spectrum of the real and imaginary enhanced phase spectrum. The IFFT block used to reconstruct the signal and provide the enhanced speech signal. The reconstruction process of the signal is demonstrate in Fig. 6c .
IFFT block is used to perform the inverse fourier transform operation over the signal. The option input/output was chosen for the IFFT to implement its pipelined versions. For the performance optimization of the FFT block, 4 multiplier structures are used and the phase factor is set to 8. Also the signal is segmented on non overlapping window of 256 samples. The architectural configuration of cordic SINCOS block is set to parallel mode for high throughput. The pipeline mode is set to maximum and the phase format is set to in the radians mode. The output width of this block is set to 16.
Performance Analysis
Field Programmable Gate Array (FPGA) contains a matrix of re-configurable logic circuitry. Different operations do not have to compete for the same processing resources because of the available special parallelism. So multiple control loops can run on a single FPGA device at different rates. The re-configurability of FPGAs can provide limitless flexibility. Most real-time systems require fast processing, which are met by the present day high speed FPGAs. The above mentioned hardware execution has been carried out on Atlys Spartan 6 FPGA board (Xilinx Spartan-6 LX45 FPGA, 324-pin BGA package,128Mbyte DDR2 16-bit wide data). Spartan-6 LX FPGAs are optimized for applications that require the absolute lowest cost. It provides up to 150K logic cells, integrated PCI express blocks, advanced memory support, 390MHz DSP slices, and 3.2 Gbps low-power transceivers.
Here, we have used all the sound sources from [23] [24] except market noise, railway platform noise and train horn noise shown in Table II . This market, railway platform and train horn noises has been recorded from the respective environment. Football ground, market, car, railway platform, train horn and exhibition hall noisy signal was sampled at 16000 Hz and white, pink, cockpit, wind and factory noise are sampled at 8000 Hz. Due to the parallel nature of the proposed architecture and avoid sequential execution in software platform, we are implemented our design in FPGA. To compare our design with existing works [1] [7] [3],we have implemented [1] [7] [3] in FPGA because no such hardware implementation was found in the respective literature.
The device utilization of our implementation is shown in Table 1 .. In Table 3 , gives the system delay where only magnitude or phase operations are taking into account due to their parallel nature. Overall system unit delay of this design is 604 and execute in Xilinx Spartan-6 LX45 FPGA. The time requirement for execution of the proposed design on Xilinx Spartan-6 LX45 FPGA board is 6.04microseconds where the board clock frequency is 100M Hz. In Table 2 , we compared the signal to noise ratio for all four methods from low Fig. 7a . We observed that proposed method provides best result in every case in all SNR conditions. The melioration of the train horn noise was not outstripped due to high baseline. Time scope representation of the hardware implementation of MSS, MBMSS, MPSS and MBMPSS are shown in Fig. 7b . So, we can conclude that our proposed design significantly outstrip the other existing methods in terms of SNR mainly. 
Conclusion
In this paper, we have proposed a novel hardware design for speech enhancement based on the spectral subtraction algorithm. The subtraction procedure is performed on both magnitude and phase spectrum of the different frequency bands. In this way we are able to eliminate noise from high SNR signals as well as low SNR signals for the different frequency bands. FPGA based hardware implementation of the proposed architecture gives better performance in-terms of SNR and throughput over the existing architectures of MSS, MPSS, MBMSS. 
